Читать книгу Data Management: a gentle introduction - Bas van Gils - Страница 29
■ 7.3 DECONSTRUCTING DM
ОглавлениеBalancing between the two goals of DM is a big task and many things have to be in place to make that happen. One of the strong points of the DMBOK that I have mentioned several times so far lies in the fact that it has broken down the field of DM into smaller pieces called functional areas. For my purposes, it makes more sense to call them (sub) capabilities, signifying that together they contribute to the overall DM capability. Figure 7.1 shows what this partitioning looks like. This visual is often called “the DMBOK wheel.”
What the DMBOK does is take each of these areas and attempt to give a broad overview of what its objectives are, which activities are part of it, which inputs/ outputs can be expected, and what type of tooling are required for support. It also describes good practices. The book is written by many authors, each taking care of a particular area. Unfortunately, this means that not all chapters are equally well aligned and that there are several small inconsistencies in the book. All in all, it is an impressive work which offers a great introduction to, and guidance for the field of DM.
Looking at the wheel, note how some areas appear to consist of two topics. For example at the bottom it says there is an area that covers reference data management and master data management. In this book, I will take a slightly different approach and make sure that – in the chapters to come – each chapter covers a single topic. I have taken a slightly different perspective that is mostly in line with the wheel. I have deliberately left out certain topics such as database operations management (which is, in my view, mostly an IT capability dealing with how data technology should be run and operated) and document & content management (which deals with unstructured data: whilst this is important, it is not the focus of this book). I will cover the topics listed in table 7.1. Example 15 illustrates that in practical settings, many of the DM capabilities are required together to achieve success.
Figure 7.1 The DMBOK wheel
Example 15. Data management example
This example is based on a real-world case at a Dutch governmental agency in the mid-1990s. One of the challenges this organization faced was a large backlog of reports that had to be completed from a regulatory perspective (business intelligence, reporting). Creating these was far from easy because data was dispersed over many systems across the organization, and there was no standard environment (e.g. a data warehouse) to bring it together (integration). To make matters worse, different departments and professionals were in disagreement about key aspects such as data definitions, ownership of data, and quality of the data (governance, quality).
Ultimately this was, of course, resolved. It took years of debate and several reorganizations to solve these problems. One of the key success factors in the end was that the organization leveraged processes, systems, policies, and procedures that were already in place and extended them one step at a time.
Table 7.1 DM topics
Chapter | Topic | Short introduction |
9 | Data governance | Data governance is the enterprise discipline concerned with starting, managing, and sustaining the DM program. Key topics are accountability, decision-making, and supporting the program. |
10 | Metadata | Metadata is, loosely defined, data about data. Anything you know about your data is metadata. This is a foundational thing for all the other capabilities: it is crucial to know definition, location, etc. of your data. |
11 | Modeling | Modeling is all about “making sense of data through boxes and arrows”. I have already shown some examples in chapter 6. This area is closely related to Architecture, and focuses on (data) modeling techniques. |
12 | Architecture | Architecture is about “fundamental properties of a system, and the principles guiding design and evolution” [ISO11]. The key challenge relates to getting to grips with the data landscape, in light of the overall architecture of the enterprise. |
13 | Integration | Integration deals with the movement of data from process to process, from system to system. The main contribution is a set of techniques and approaches to ensure that data flows through the organization so that it can be used where needed. |
14 | Reference data | Reference data is about “understanding data through data”. This is the realm of code lists and hierarchies of codes. An example would be codes for geographical areas where the company does business, or codes that define the types of products the company offers. |
15 | Master data | Master data is concerned with creating a “golden copy” of data about key business concepts for the organization, by creating a single version of the truth. There are many ways to achieve this. This area ties in closely with Integration. |
16 | Quality | Data quality is about data that is fit for purpose. It is about setting requirements (a norm) and taking corrective action when data doesn’t meet them. This may entail different quality attributes, such as correctness and completeness. |
17 | Security | Security is about a risk-based approach to protecting data assets. It is concerned with defining a data security policy, data classification (confidentiality, integrity, availability) and implementing measures to keep data safe according to this policy. |
18 | Business intelligence | Business intelligence (BI) is concerned with reporting what happened in the past and with data-driven predictions about the future (analytics). |
19 | Big data | Big data is still a major trend. It is about data sets with many data points (volume) that change rapidly (velocity) and often of various types (variability) [ZEDR+11]. Dealing with this type of data is completely different from traditional (little) data from a technical perspective, and opens the door to a whole range of new insights from a business perspective. |
20 | Technology | This area is not listed in the DMBOK wheel. I’ve included a chapter on this topic to give a focused, high-level overview of relevant developments in the area of data/ DM technology. |