Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 28

LINKING THE DATA LAKE ZONES TOGETHER

Оглавление

The following figure shows the progressive pipelines of data among the various zones, including the sandbox. Notice how not every piece or group of data is cleansed and then sent from the bronze zone to the silver zone. You’ll spend time refurbishing, refining, and transmitting data to the silver zone that you definitely or likely need for analytics.


Likewise, select data sets are sent from the silver zone to the gold zone. Remember that another name for the gold zone is the curated zone, meaning that you’ve especially selected certain data to be consolidated and then placed in “packages” within the gold zone.

You might transmit raw, uncleansed data from the bronze zone into the sandbox along with data from the silver zone, depending on the specifics of your experimental or short-term analytical needs.

You will almost certainly replicate data across the various gold zone packages, but that’s not a problem at all. As long as you carefully control the data flows and the replicated data, you’re unlikely to run into problems with uncontrolled data proliferation.

Data Lakes For Dummies

Подняться наверх