Читать книгу Managing Data Quality - Tim King - Страница 27
ОглавлениеThe data asset
11
What is data quality?
The fundamental effect of data quality is the right data being available at the right time to the right users, to make the right decision and achieve the right outcome.
This can be extended by considering that good quality data are safe, legal and processed fairly, correctly and securely.
Whilst ‘perfect’ data quality appears desirable, the reality is that organisations are unlikely to have the time, resources, budget or needs for ‘perfect’ data (and never will have). Therefore, you need to accept that your data are never perfect, and probably never will be. So, accepting this fact, you need to be able to understand and describe the nature of your data quality.
If someone states that ‘the weather is bad’, for example, this has little meaning without stating whether it is too hot or too cold, too wet or too dry, too windy or too still and so on. For some people, the weather could be good (the sailor who wants a fast journey), whilst the same weather is bad for other people (the construction company trying to erect a new offshore wind farm). Similarly, if someone states that they have poor quality data, this can be difficult to interpret without a better way of describing the nature of the data; as such, it is useful to use appropriate characteristics to measure data quality.
These considerations lead to the need for more detail on the ‘fitness for purpose’ of data and data characteristics.
Fitness for purpose
In quality management, the term ‘quality’ is an assessment of whether an item or activity conforms with the requirements for it.
For example, a metal shaft used in the assembly of a machine is specified to have a diameter of 12.2 mm +/- 0.015 mm, along with many other requirements (e.g. length, material, surface finish, etc.). If one of these shafts was measured with a diameter of 12.196 mm, it would be deemed to have passed the quality test of assessing diameter. The shaft is a physical item that cannot easily serve another purpose.
for example, contains formatting information to ensure that the information is correctly displayed. There will, however, be little consistency between different documents (or messages), nor will it be easy to identify issues within the body of a document from a data quality perspective.
Sentiment analysis tools can be used to infer the general mood of a collection of messages based on identified key words and phrases. This, though, is not the same as assessing the quality of the data. From a data quality management perspective, the approaches defined in this book can easily be applied to the metadata of semi-structured data, but understanding the quality of the ‘body’ of documents and messages will be more challenging.