Читать книгу Digital Transformation for Chiefs and Owners. Volume 1. Immersion - Джимшер Челидзе - Страница 20

Chapter 2. Technology. Pros, Cons, Personal Opinions
Big Data (Big Data)

Оглавление

Big data (big data) is the cumulative name for structured and unstructured data. Additionally, in volumes that are simply impossible to handle manually.

Often this is still understood as tools and approaches to work with such data: how to structure, analyze and use for specific tasks and purposes.

Unstructured data is information that has no predefined structure or is not organized in a specific order.

Application Field

– Process Optimization. For example, big banks use big data to train a chat bot – a program that can replace a live employee with simple questions, and if necessary, will switch to a specialist. Or the detection of losses generated by these processes.

– Forecasting. By analysing big sales data, companies can predict customer behaviour and customer demand depending on the season or the location of goods on the shelf. They are also used to predict equipment failures.

– Model Construction. The analysis of data on equipment helps to build models of the most profitable operation or economic models of production activities.

– Sources of Big Data Collection

– Social – all uploaded photos and sent messages, calls, in general everything that a person does on the Internet.

– Machine – generated by machines, sensors and the «Internet of things»: smartphones, smart speakers, light bulbs and smart home systems, video cameras in the streets, weather satellites.

– Transactions – purchases, transfers of money, deliveries of goods and operations with ATMs.

– Corporate databases and archives. Although some sources do not assign them to Big Data. Here there are disputes. Additionally, the main problem – non-compliance with the criteria of «renewability» of data. More about this a little below.

Big Data Categories

– Structured data. Have a related table and tag structure. For example, Excel tables that are linked together.

– Semi-structured or loosely structured data. They do not correspond to the strict structure of tables and relationships but have «labels» that separate semantic elements and provide a hierarchical structure of records. Like information in e-mails.

– Unstructured data. They have no structure, order, hierarchy at all. For example, plain text, like in this book, is image files, audio and video.

Such data is processed on the basis of special algorithms: first, the data is filtered according to the conditions that the researcher sets, sorted and distributed among individual computers (nodes). The nodes then calculate their data blocks in parallel and transmit the result of the computation to the next stage.

Big data feature

According to different sources, big data have three, four and, according to some opinions, five, six or even eight components. However, let’s focus on what I think is the most sensible concept of four components.

– Volume (volume): Information should be a lot. Usually speak of quantity from 2 terabytes. Companies can collect a huge amount of information, the size of which becomes a critical factor in analytics.

– Velocity (speed): data must be updated, otherwise they become obsolete and lose value. Almost everything that happens around us (search queries, social networks) produces new data, many of which can be used for analysis.

– Variety (variety): generated information is heterogeneous and can be presented in different formats: video, text, tables, numerical sequences, sensor readings.

– Veracity (reliability): the quality of the data analysed. They must be reliable and valuable for analysis, so that they can be trusted. Low-fidelity data also contain a high percentage of meaningless information, which is called noise and has no value.

Restrictions on the Big Data Implementation

The main limitation is the quality of the raw data, critical thinking (what do we want to see? What pain? – This is done ontological models), the right selection of competencies. Well, and most importantly – people. Data-Scientists are engaged in work with the data. Additionally, there is one common joke: 90% of the data-scientists are data-satanists.

Digital Transformation for Chiefs and Owners. Volume 1. Immersion

Подняться наверх