Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 57

Building the best data pipelines inside your data lake

Оглавление

A data pipeline is an end-to-end flow of data from the original sources all the way to the end users of analytics. Figure 2-7 shows a data pipeline overlaying the journey from source systems, through the data lake’s bronze zone (the home of raw data), through the cleansing of that data into the silver zone, into the gold zone that consists of curated “packages” of data, and then finally to the users who consume the data-driven insights.


FIGURE 2-7: A data pipeline into, through, and then out of the data lake.

You can think of a data pipeline in the same context that you may think of shopping. Suppliers sell and ship their products to wholesalers, who then resell and ship some of those products to a wholesaler. The wholesaler then resells and ships the products yet again to a retailer, which is where you come to buy whatever it is that you’re looking for. Figure 2-8 shows how this paradigm can apply to data pipelines within a data lake.


FIGURE 2-8: An easy way to understand data pipelines and data lakes.

Data Lakes For Dummies

Подняться наверх