Читать книгу Data Management: a gentle introduction - Bas van Gils - Страница 11
Оглавление
It is often said that “data is the new oil”. It is hard to figure out with any certainty who wrote about this metaphor first. A cursory search on Google suggests it was used originally in an article by The Economist [Par17] with many authors following suit by describing why, for all practical reasons, data is not the new oil (e.g. [Mar18]). Whatever the practical implications, the metaphor at least illustrates that data is an important business asset that deserves to be managed as such. This is the field of data management (or DM for short). See also sidebar 1.
Sidebar 1. Interview with Marco van der Winden (Summer 2019)
My experience is that the importance of data is underestimated in the way that there was/ is no primary focus on it. Living in the low countries where there is an abundance of water, data is mostly seen as something that can be easily be obtained, just like water. To continue the comparison, the Dutch are very good with containing the water streams and keeping the seawater outside with dikes. But with data we are less experienced. We let data sometimes uncontrollably flow though our fields without knowing where it goes or even why we are doing it.
We are not in the Middle Ages (when we became increasingly proficient at water management) and it should be clear that data must be governed in a way that we are more in control and that we can profit more from it. By the way, I think that a comparison with oil is not a smart one. Sooner or later there will be a shortage of oil. Above that, there are also some environmental disadvantages with oil. Data is more like water. It’s the source of all living things. You can’t live without it and there will always be water.
Marco van der Winden is manager of the corporate data management office at PGGM, a Dutch pension provider.
A key question that needs answering is: what does that entail? In other words: what is data management (DM) and how do you make it work? These are hard questions. Data is often seen as an abstract “thing” that sits in the realm of the IT department. This isn’t helped by the fact that a lot of technology is so closely related to data that it is easy to confuse one for the other. Worse, data management professionals are prone to using complicated terminology such as metadata, master data, lineage and so on, which makes it hard for outsiders to truly understand what is going on. This is not a good thing: DM is an important capability that organizations must master1.
To illustrate this point, I will borrow a slightly altered example from [Soa11] in example 1.
Example 1. Data management benefits
Assume you are working for a large global company with approximately 10 million customers. On average each customer purchases 1.2 products every year. Your strategy is to attempt to get more revenue from the existing customer base, rather than try to capture a bigger market share. To that end, a global customer 360 initiative is considered. The data management team and marketing have worked together to compile a business case.
First, it is expected that a better overview of each customer will increase the number of purchases from 1.2 to 1.4, which is expected to raise an extra 8 million dollars in revenues over three years. Furthermore, it is estimated that the direct cost of wading through duplicated/ inconsistent data about customers by customer service representatives adds up to about half a million dollars over three years. The direct cost of the IT department around data integration issues is expected to be reduced by another half a million dollars over three years. This adds up to nine million dollars in benefits. Would that justify a significant investment in data management?
■ 1.1 GOALS FOR THIS BOOK
One of the best ways to make progress in our field is to put knowledge in the public domain such that everyone can benefit from it. There are many ways to do this: scientific studies provide academic rigor but tend to be low on practical relevance. Handbooks such as the DMBOK2 are the inverse: there is a lot of practical value but they tend to be low on the academic rigor [Hen17]. Balancing rigor and relevance is tricky to say the least. This book leans towards the practical relevance side and provides academic rigor whenever possible. The unique selling point of this book will lie in the fact that it offers (1) an up-to-date overview of the field, (2) with practical guidance in the form of a capability-based framework, and (3) is supported by real-world evidence through mini case studies.
The overall objective is to show that data management (DM) is an exciting and valuable capability that is worth time and effort. More specifically, I hope to achieve the following goals. First, I hope to give a “gentle” introduction to the field of DM by explaining and illustrating its core concepts. In doing so, I will demystify terminology as much as possible. To this end, I will use a mix of theory, practical frameworks such as TOGAF, ArchiMate, and DMBOK, as well as results from real-world assignments [The11, The16a, Hen17].
Second, I will offer guidance on how to build an effective DM capability for your organization. I will do so by considering various use cases, linked to the previously mentioned theoretical exploration as well as the stories of practitioners in the field.
■ 1.2 INTENDED AUDIENCE
The book aims at a broad audience: busy professionals who “are actively involved with managing data”. This might be a bit too broad because it is hard to imagine a book that would successfully address the needs of strategic decision makers all the way down to analysts and database administrators. The book is also aimed at (Bachelor’s/ Master’s) students with an interest in data management. A more specific characterization of the (professional) audience is:
■ In the strategic/ tactical/ operational continuum, I will go for the middle ground. This means: stay away from executives and top management. It also means: stay away from true day-to-day business operations.
■ In the business/ technology continuum, again, I will aim for the middle ground. It is increasingly true that there is no real difference between business and IT but for the sake of the argument: I am aiming at business people with a sense of IT, IT people with a sense of business and those who straddle both worlds.
■ Industry-wise, the book should be agnostic and should be applicable in different industries such as government, finance, telecommunications etc.
Typical roles that come to mind are: data governance office/ council, data owners, data stewards, people involved with data governance (data governance board), enterprise architects, data architects, process managers, business analysts and IT analysts.
■ 1.3 APPROACH
In this book, I will combine elements from theory and from practice. The former comes in the shape of citations to books, articles and web resources. I will attempt to link to original sources whenever possible, but also make an attempt to give the book a look-and-feel that is not too academic. The same goes for the practical part: I will combine my own experience of 15+ years as a consultant and teacher with stories from other professionals. I will provide the names of organizations and people whenever possible. In some places, stories have been anonymized to ensure privacy, or to comply with non-disclosure agreements. The theory part of the book will give a broad overview of the field of data management. The practical part will cover specific topics and use cases in more depth. More detailed coverage of specific topics can be found by following the citations or reaching out to listed practitioners.
The book is mainly aimed at busy professionals - while I also take into account that students and perhaps even scholars will find the book useful. Because of this, I have made two decisions with respect to the book structure. First of all, I have chosen to split the book into three main parts: theory, practice, and closing remarks. Furthermore, I have chosen to keep the chapters as short and to the point as possible and also make a clear distinction between the main text and the examples. Because of this choice, the book will have many short chapters. If you are already familiar with the topic of a chapter, you can easily skip it and move on to the next.
1 Throughout this book, I will use the term capability to signify an ability/ discipline that an organization may have. The simple formula capability = capacity × ability further signifies that the organization not only has to master the ability, but also have sufficient resources with the right abilities available in order to be successful.
2 The DMBOK is the Data Management Body of Knowledge. It is a reference book by DAMA, the Data Management Association. The DMBOK compiles data management principles and best practices.