Читать книгу Big Data - Seifedine Kadry - Страница 50

2 Big Data Storage Concepts CHAPTER OBJECTIVE

The various storage concepts of big data, namely, clusters and file system are given a brief overview. The data replication, which has made big the data storage concept a fault tolerant system is explained with master‐slave and peer‐peer types of replications. Various storage types of on‐disk storage are briefed. Scalability techniques, namely, scaling up and scaling out, adopted by various database systems are overviewed.

In big data storage, architecture data reaches users through multiple organization data structures. The big data revolution provides significant improvements to the data storage architecture. New tools such as Hadoop, an open‐source framework for storing data on clusters of commodity hardware, are developed, which allows organizations to effectively store and analyze large volumes of data.

In Figure 2.1 the data from the source flow through Hadoop, which acts as an online archive. Hadoop is highly suitable for unstructured and semi‐structured data. However, it is also suitable for some structured data, which are expensive to be stored and processed in traditional storage engines (e.g., call center records). The data stored in Hadoop is then fed into a data warehouse, which distributes the data to data marts and other systems in the downstream where the end users can query the data using query tools and analyze the data.

In modern BI architecture the raw data stored in Hadoop can be analyzed using MapReduce programs. MapReduce is the programming paradigm of Hadoop. It can be used to write applications to process the massive data stored in Hadoop.

Figure 2.1 Big data storage architecture.

Подняться наверх