Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 16

More than just the water

Оглавление

Think of a data lake as being closer to a lake resort rather than just the lake — the body of water — in its natural state. If you were a real estate developer, you might buy the property that includes the lake itself, along with plenty of acreage surrounding the lake. You’d then develop the overall property by building cabins, restaurants, boat docks, and other facilities. The lake might be the centerpiece of the overall resort, but its value is dramatically enhanced by all the additional assets that you’ve built surrounding the lake.

A data lake is an entire environment, not just a gigantic collection of data that is stored within a data service such as Amazon S3 or Microsoft ADLS.

In addition to data storage, a data lake also includes the following:

 One or (usually) more mechanisms to move data from one part of the data lake to another.

 A catalog or directory that helps keep track of what data is where, as well as the associated rules that apply to different groups of data; this is known as metadata.

 Capabilities that help unify meanings and business rules for key data subjects that may come into the data lake from different applications and systems; this is known as master data management.

 Monitoring services to track data quality and accuracy, response time when users access data, billing services to charge different organizations for their usage of the data lake, and plenty more.

Data Lakes For Dummies

Подняться наверх