Читать книгу Data Management: a gentle introduction - Bas van Gils - Страница 32
Оглавление
Synopsis - In this chapter, I introduce the topic of data governance. Data governance is the capability that deals with accountability for data. I will first position data governance in relation to (other) data management (activities). Then I will provide an overview of key data governance themes based on the Data Management Body of Knowledge (DMBOK) [Hen17]. Last but not least, I will give an overview of a modern approach to data governance based on three key roles: data owners, data users, and data stewards.
■ 9.1 INTRODUCTION
The word governance, or its associated verb to govern, has many definitions and interpretations, depending on the context in which it is used. Many people seem to associate this word with (the use of) power; with laying down and enforcing the law. This view is indeed close to the Merriam-Webster Dictionary definition which uses phrases such as “to exercise continuous sovereign authority over” and “to control, direct, or strongly influence the actions and conduct of”. The DMBOK defines data governance as follows [Hen17]:
Data governance is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.
This definition screams a command-and-control, top-down approach to governance: make plans, define rules, implement, enforce, and punish when the rules are not followed. This isn’t the only way to implement data governance, though. In this chapter, I will first show how to position data governance in relation to data management. I will then follow-up with a discussion of the data governance activities as listed in the DMBOK and a discussion of a modern approach to data governance through data stewards, data owners, and data users. I will end the chapter with a brief discussion of the relationship between data governance and other governance processes that may be followed in the organization.
■ 9.2 DATA GOVERNANCE AND DATA MANAGEMENT
Looking closely at the definition of data governance from the DMBOK, it becomes clear that there is a relationship between data governance (DG) and data management (DM). This relationship – also pointed out by John Ladley in [Lad12] – is highlighted in figure 9.1 which was taken from the DMBOK. The idea is straightforward and not unlike the separation of powers in modern day (western) politics1: separate decision-making and oversight (DG) from the actual execution of DM activities. In my view, this has several implications.
Figure 9.1 Data Governance & Data Management (Taken from [Hen17])
First of all, DG is not so much about governing data (which are innate) but more about governing the people who handle data. In other words, it is about deciding what people can and can’t do with data, as well as ensuring that there are guard rails in place to make that happen. Whether this happens in a top-down fashion (define the policy, analyze implications, implement the policy) or in a bottom-up fashion (capture good practices from across the organization in a policy and arrange for sign-off) is a whole different matter.
A second implication deals with the type of decisions to be made: strategic, tactical, and operational. Example 20 illustrates different types of DG decisions that organizations deal with.
Example 20. Data governance decisions
Strategic decisions Setting up a data strategy is a prime example of a strategic decision. This entails questions such as: how and where do we want to create value with data? How does our business model evolve when we leverage data as a key asset? Are we going to let business units control their own data, or are we trying to achieve synergies between business units? Another example is the development of a data management strategy to complement the data strategy. Relevant questions here are: how good should our data management capability be? Are we going to centralize or decentralize certain data management functions?
Tactical decisions Setting up governance structures, appointing people in DM/ DG roles, and approving policies are good examples of tactical decisions. These types of decisions bridge the gap between the strategic and operational levels.
Operational decisions Approval of definitions of business concepts, dealing with conflicting definitions or data quality requirements, and sign-off on data quality improvement initiatives are good examples of operational decisions. The focus here is on decision-making about the operational data management activities.
Let’s examine these examples from the perspective of the DMBOK wheel as shown in figure 7.1. There is a reason that DG is in the center of the wheel: decision-making is something that is required for all capabilities in the wheel.
■ 9.3 DATA GOVERNANCE ACTIVITIES IN DMBOK
If DG is all about decision-making then the question is: what do we make decisions about? The previous example gave some suggestions. To give a more formal answer I will briefly discuss several governance topics that are listed by the DMBOK. This is by no means a complete summary of the DMBOK, nor is it intended to be. Instead, I am trying to give a broad enough overview to provide you with an understanding of what DG is all about.
One of the key topics is to define the organizational structure for DG in the form of steering committees, boards, and different roles in the organization. This is closely related to the operating model type, which helps to decide which activities are carried out and where. The main models that are listed are: centralized DG, replicating the DG structure across business units with little central coordination, and a federated approach to DG where there is a distribution of decision-making between business units on the one hand, and a central body on the other.
The DMBOK also advocates an approach to governance that uses data stewardship as a cornerstone. Data stewardship is defined as “a label to describe accountability and responsibility for data and processes that ensure effective control of data assets”. This definition is abstract. A more informal definition would be: data stewards are those people who (hands-on) take care of data assets across the enterprise and therefore are assigned accountability and responsibility for those data assets.
The last topic that is mentioned is policy-making. Policies codify general principles and rules with respect to the use of data assets. Typically, this includes such things as formal roles and responsibilities2, procedures for handling data quality issues, and rules for data classification.
Data governance is a big topic that requires many roles across the organization to collaborate. The DMBOK lists several roles that contribute to effective DG, including business executives, data owners/ stewards, architects, compliance teams, other governance bodies, and data professionals. How to set this up properly is discussed in several chapters in part II of this book.
■ 9.4 A MODERN APPROACH TO DATA GOVERNANCE
The modern approach to data governance is based on three roles and is illustrated in figure 9.2. These roles are as follows:
• Data owner - The data owner is the person who is ultimately accountable for a data set. The data owner ensures that data is fit for the purpose of the people who want to use it. As a rule of thumb, data ownership lies where data is created, as this is the only place where its correctness can be verified. This is illustrated in example 21.
• Data user - The data user is the person who wants to use/ uses data. Typically, the data user negotiates with the data owner about data access. Common topics are: what (types of) data does the user wish to use? What are data definitions? What are data quality requirements?
• Data steward - The data steward is the person with hands-on responsibility for managing the data. Data stewards tend to have a mixed business/ IT background3. Both the data owner and the data user tend to have management positions. Therefore, people in these roles tend to be supported by data stewards, as shown in figure 9.2.
Example 21. Assigning data ownership
Suppose we are looking for the data owner for the “product” business concept in a company that produces electronics. New products tend to be defined by the Product Development Department. The decision to actually move forward in launching new products together with the opinion of other departments (e.g. Marketing) are of course considered, but ultimately the accountability for new products lies with this department. Therefore, someone in this department should also be designated the data owner role for the “product” business concept.
Note that this approach to data governance addresses only one piece of the puzzle: it deals with the accountability of data assets but does not address the over-arching issues such as policymaking and alignment. As such, this approach should always be complemented by other approaches to achieve a sufficient level of data governance maturity.
Figure 9.2 illustrates this way of thinking. The top of the diagram is all about coordination between different organizational roles. This is where the actual governance activities happen: data owners and data users, supported by data stewards, negotiate the use of data. The bottom part of the diagram signifies the storage and flow of data in such a way that the agreement is met.
Figure 9.2 Data governance model
Several questions remain, such as: how do you find good data owners and data stewards? How do they perform their role effectively? The theme here is non-invasive data governance which will be explained in more detail in part II – in chapter 25 – of this book.
■ 9.5 POSITION OF DATA GOVERNANCE
I will close this chapter with a brief discussion of the position of data governance in the organization, especially in relation to other governance processes in the organization. I often hear arguments along the lines of “Data resides in our systems, so data must be an IT thing. As a consequence, data governance should fall under the jurisdiction of IT governance”. There is some merit to this position but only if you believe that data/ data management is an IT topic. I tend to disagree.
As explained in this book, I believe data to be a topic of its own and one that should not be positioned as yet another IT topic. Consider once more figure 9.2. The top layer of this diagram shows governance activities from a data perspective, using the ownership/ stewardship model. This is intended to govern the data in the systems, not the systems themselves. This would be the realm of IT governance which has its own models and frameworks for governance, most notably COBIT (see e.g. [ISA12]). As processes, data, and systems are all important, so are their governance activities and I believe that they should co-exist. Governance activities should complement each other and should therefore be coordinated. The way this works best really depends on the local setting and culture of the organization: there is no single optimum answer to that problem.
■ 9.6 VISUAL SUMMARY
1 https://en.wikipedia.org/wiki/Separation_of_powers, last checked; 12 June 2019.
2 Typically in the form of a RACI matrix. See e.g. https://en.wikipedia.org/wiki/Responsibility_assignment_matrix. Last checked: 12 June 2019.
3 It is hard to find people with this dual background. As an alternative, many organizations work with stewardship duos: a business data steward paired to an IT data steward.