Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 24
The Data Lake Olympics
ОглавлениеSuppose you head off for a weeklong vacation to your favorite lake resort. The people who run the resort have divided the lake into different zones, each for a different recreational purpose. One zone is set aside for water-skiing; a second zone is for speedboats, but no water-skiing is permitted in that zone; a third zone is only for boats without motors; and a fourth zone allows only swimming but no water vessels at all.
The operators of the resort could’ve said, “What the heck, let’s just have a free-for-all out on the lake and hope for the best.” Instead, they wisely established different zones for different purposes, resulting in orderly, peaceful vacations (hopefully!) rather than chaos.
A data lake is also divided into different zones. The exact number of zones may vary from one organization’s data lake to another’s, but you’ll always find at least three zones in use — bronze, silver, and gold — and sometimes a fourth zone, the sandbox.
Bronze, silver, and gold aren’t “official” standardized names, but they are catchy and easy to remember. Other names that you may find are shown in Table 1-1.
TABLE 1-1 Data Lake Zones
Recommended Zone Name | Other Names |
---|---|
Bronze zone | Raw zone, landing zone |
Silver zone | Cleansed zone, refined zone |
Gold zone | Performance zone, curated zone, data model zone |
Sandbox | Experimental zone, short-term analytics zone |
All the data lake zones, including the sandbox, are discussed in more detail in Part 2, but the following sections provide a brief overview.
The boundaries and borders between your data lake zones can be fluid (Fluid? Get it?), especially with streaming data, as I explain in Part 2.