Читать книгу Data Science in Theory and Practice - Maria Cristina Mariani - Страница 16
1.4.1 Characteristics of Big Data
ОглавлениеBig data has one or more of the following characteristics: high volume, high velocity, high variety, and high veracity. That is, the data sets are characterized by huge amounts (volume) of frequently updated data (velocity) in various types, such as numeric, textual, audio, images and videos (variety), with high quality (veracity). We briefly discuss each in detail. Volume: Volume describes the quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not. Velocity: Velocity describes the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in both stored and real‐time. Compared to small data, big data are produced more continually (it could be nanosecond, second, minute, hours, etc.). Two types of velocity related to big data are the frequency of generation and the frequency of handling, recording, and reporting. Variety: Variety describes the type and formats of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from different formats and completes missing pieces through data fusion. Data fusion is a term used to describe the technique of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source. Veracity: Veracity describes the quality of data and the data value. The quality of data obtained can greatly affect the accuracy of the analyzed results. In the next subsection we will discuss some big data architectures. A comprehensive study of this topic can be found in the application architecture guide of the Microsoft technical documentation.