Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 12
What Is a Data Lake?
ОглавлениеAsk a friend this question: “What’s a lake?” Your friend thinks for a moment, and then gives you this answer: “Well, it’s a big hole in the ground that’s filled with water.”
Technically, your friend is correct, but that answer also is far from detailed enough to really tell you what a lake actually is. You need more specifics, such as:
How big, dimension-wise (how long and how wide)
How deep that “big hole in the ground” goes
How much variability there is from one lake to another in terms of those length, width, and depth dimensions (the Great Lakes, anyone?)
How much water you’ll find in the lake and how much that amount of water may vary among different lakes
Whether a lake contains freshwater or saltwater
Some follow-up questions may pop into your mind as well:
A pond is also a big hole in the ground that’s filled with water, so is a lake the same as a pond?
What distinguishes a lake from an ocean or a sea?
Can a lake be physically connected to another lake?
Can the dividing line between two states or two countries be in the middle of a lake?
If a lake is empty, is it still considered a lake?
If one lake leaves Chicago, heading east and travels at 100 miles per hour, and another lake heads west from New York … oh wait, wrong kind of word problem, never mind… .
So many missing pieces of the puzzle, all arising from one simple question!
You’ll find the exact same situation if you ask someone this question: “What’s a data lake?” In fact, go ahead and ask your favorite search engine that question. You’ll find dozens of high-level definitions that will almost certainly spur plenty of follow-up questions as you try to get your arms around the idea of a data lake.
Here’s a better idea: Instead of filtering through all that varying — and even conflicting — terminology and then trying to consolidate all of it into a single comprehensive definition, just think of a data lake as the following:
A solidly architected, logically centralized, highly scalable environment filled with different types of analytic data that are sourced from both inside and outside your enterprise with varying latency, and which will be the primary go-to destination for your organization’s data-driven insights
Wow, that’s a mouthful! No worries: Just as if you were eating a gourmet fireside meal while camping at your favorite lake, you can break up that definition into bite-size pieces.