Data Lakes For Dummies

Data Lakes For Dummies
Автор книги: id книги: 2084221     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 2824 руб.     (27,77$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Базы данных Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119786184 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Take a dive into data lakes  “Data lakes” is the latest buzz word in the world of data storage, management, and analysis.  Data Lakes For Dummies  decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs.  With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored.  Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

Оглавление

Alan R. Simon. Data Lakes For Dummies

Data Lakes For Dummies® To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Data Lakes For Dummies Cheat Sheet” in the Search box. Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Introduction

About This Book

Foolish Assumptions

Icons Used in This Book

Beyond the Book

Where to Go from Here

Getting Started with Data Lakes

Jumping into the Data Lake

What Is a Data Lake?

Rock-solid water

A really great lake

Expanding the data lake

More than just the water

Different types of data

Structured data: Staying in your own lane

Unstructured data: A picture may be worth ten million words

Semi-structured data: Stuck in the middle of the lake

Different water, different data

Refilling the data lake

Everyone visits the data lake

The Data Lake Olympics

The bronze zone

The silver zone

The gold zone

LINKING THE DATA LAKE ZONES TOGETHER

The sandbox

Data Lakes and Big Data

THE THREE (OR FOUR OR FIVE OR MORE) VS OF BIG DATA AND DATA LAKES

The Data Lake Water Gets Murky

BACK TO THE FUTURE WITH NAME CHANGES

Planning Your Day (and the Next Decade) at the Data Lake

Carpe Diem: Seizing the Day with Big Data

Managing Equal Opportunity Data

BACK TO THE FUTURE, PART 2

Building Today’s — and Tomorrow’s — Enterprise Analytical Data Environment

Constructing a bionic data environment

Strengthening the analytics relationship between IT and the business

Reducing Existing Stand-Alone Data Marts

Dealing with the data fragmentation problem

Decision point: Retire, isolate, or incorporate?

Data mart retirement

Data mart isolation

Data mart incorporation

Eliminating Future Stand-Alone Data Marts

Establishing a blockade

Providing a path of least resistance

Establishing a Migration Path for Your Data Warehouses

Sending a faithful data warehouse off to a well-deserved retirement

Resettling a data warehouse into your data lake environment

Aligning Data with Decision Making

Deciding what your organization wants out of analytics

Mapping your analytics needs to your data lake road map

Building the best data pipelines inside your data lake

Addressing future gaps and shortfalls

Speedboats, Canoes, and Lake Cruises: Traversing the Variable-Speed Data Lake

Managing Overall Analytical Costs

Break Out the Life Vests: Tackling Data Lake Challenges

That’s Not a Data Lake, This Is a Data Lake!

Dealing with conflicting definitions and boundaries

Data lake cousins

The cloud database

A nice house at the lake

Data hubs

Data fabric and data mesh

Exposing Data Lake Myths and Misconceptions

Misleading data lake campaign slogans

The single-platform misconception

No upfront data analysis required

The false tale of the tortoise and the data lake

Navigating Your Way through the Storm on the Data Lake

Building the Data Lake of Dreams

DATA DUMP OR DATA SWAMP?

Performing Regular Data Lake Tune-ups — Or Else!

Technology Marches Forward

Building the Docks, Avoiding the Rocks

Imprinting Your Data Lake on a Reference Architecture

Playing Follow the Leader

Guiding Principles of a Data Lake Reference Architecture

A Reference Architecture for Your Data Lake Reference Architecture

Incoming! Filling Your Data Lake

Supporting the Fleet Sailing on Your Data Lake

Objects floating in your data lake

SOME SCOPING FOR ADLS

Mixing it up

The Old Meets the New at the Data Lake

Keeping the shiny parts of the data warehouse

Flooding the data warehouse

Using your data lake as a supersized staging layer

Split-streaming your inbound data along two paths

Which is the bigger breadbox?

Bringing Outside Water into Your Data Lake

Streaming versus batch external data feeds

Ingestion versus as-needed external data access

FISHING IN THE AWS DATA EXCHANGE

Playing at the Edge of the Lake

Anybody Hungry? Ingesting and Storing Raw Data in Your Bronze Zone

Ingesting Data with the Best of Both Worlds

Row, row, row your data, gently down the stream

Supplementing your streaming data with batch data

The gray area between streaming and batch

Joining the Data Ingestion Fraternity

Following the Lambda architecture

Using the Kappa architecture

Storing Data in Your Bronze Zone

Implementing a monolithic bronze zone

Building a multi-component bronze zone

Coordinating your bronze zone with your silver and gold zones

Just Passing Through: The Cross-Zone Express Lane

Taking Inventory at the Data Lake

Bringing Analytics to Your Bronze Zone

Turning your experts loose

Taking inventory in the bronze zone

Getting a leg up on data governance

Your Data Lake’s Water Treatment Plant: The Silver Zone

Funneling Data further into the Data Lake

Sprucing up your raw data

Refining your raw data

Enriching your raw data

Bringing Master Data into Your Data Lake

Impacting the Bronze Zone

Deciding whether to leave a forwarding address

Deciding whether to retain your raw data

Getting Clever with Your Storage Options

Working Hand-in-Hand with Your Gold Zone

Bottling Your Data Lake Water in the Gold Zone

Laser-Focusing on the Purpose of the Gold Zone

Looking Inside the Gold Zone

Object stores

Databases

Persistent streaming data

Specialized data stores

Deciding What Data to Curate in Your Gold Zone

Seeing What Happens When Your Curated Data Becomes Less Useful

Playing in the Sandbox

Developing New Analytical Models in Your Sandbox

Comparing Different Data Lake Architectural Options

Experimenting and Playing Around with Data

Fishing in the Data Lake

Starting with the Latest Guidebook

Setting up role-based data lake access

Setting up usage-style data lake access

Taking It Easy at the Data Lake

Staying in Your Lane

Doing a Little Bit of Exploring

Putting on Your Gear and Diving Underwater

Rowing End-to-End across the Data Lake

Keeping versus Discarding Data Components

Getting Started with Your Data Lake

Shifting Your Focus to Data Ingestion

Breaking through the ingestion congestion

Cranking up the data refinery

Adding to your data pipelines

Finishing Up with the Sandbox

Evaporating the Data Lake into the Cloud

A Cloudy Day at the Data Lake

Rushing to the Cloud

The pendulum swings back and forth

CLOUD DATA LAKES IN THE DISCO ERA (SORT OF)

Dealing with the challenges of on-premises hosting

The case for the cloud

Running through Some Cloud Computing Basics

Public, private, and hybrid clouds

Different “as a service” models

The Big Guys in the Cloud Computing Game

Building Data Lakes in Amazon Web Services

The Elite Eight: Identifying the Essential Amazon Services

Amazon S3

AWS Glue

AWS Lake Formation

Amazon Kinesis Data Streams

Amazon Kinesis Data Firehose

Amazon Athena

Amazon Redshift

Amazon Redshift Spectrum

Looking at the Rest of the Amazon Data Lake Lineup

AWS Lambda

Amazon EMR

Amazon SageMaker

Amazon Aurora

Amazon DynamoDB

Even more AWS databases

WHY SO MANY AWS DATABASES?

Building Data Pipelines in AWS

Building Data Lakes in Microsoft Azure

Setting Up the Big Picture in Azure

The Azure infrastructure

The 50,000-foot view of Azure data lakes

The Magnificent Seven, Azure Style

Azure Data Lake Storage Gen 2

BEWARE THE BLOB!

Azure Data Factory

Azure Databricks

Azure Event Hubs

Azure IoT Hub

Azure Cosmos DB

Azure ML

Filling Out the Azure Data Lake Lineup

Azure Stream Analytics

Microsoft Azure SQL Database

SQL Server Integration Services

Azure Analysis Services

Power BI

Azure HDInsight

Assembling the Building Blocks

General IoT analytics

Predictive maintenance for industrial IoT

DATA LAKES AND BUSINESS PROCESSES

Defect analysis and prevention

Rideshare company forecasting

Cleaning Up the Polluted Data Lake

Figuring Out If You Have a Data Swamp Instead of a Data Lake

Designing Your Report Card and Grading System

Looking at the Raw Data Lockbox

Knowing What to Do When Your Data Lake Is Out of Order

Too Fast, Too Slow, Just Right: Dealing with Data Lake Velocity and Latency

Dividing the Work in Your Component Architecture

Tallying Your Scores and Analyzing the Results

Defining Your Data Lake Remediation Strateg y

Setting Your Key Objectives

Going back to square one

Determining your enterprise analytics goals

Doing Your Gap Analysis

Identifying shortfalls and hot spots

Prioritizing issues and shortfalls

Identifying Resolutions

Knowing where your data lake needs to expand

Repairing the data lake boat docks

Linking analytics to data lake improvements

Establishing Timelines

Identifying critical business deadlines

Sequencing your upcoming data lake repairs

Looking for dependency and resource clashes

Defining Your Critical Success Factors

What does “success” mean?

What must be in place to enable success?

Refilling Your Data Lake

The Three S’s: Setting the Stage for Success

Refining and Enriching Existing Raw Data

Starting slowly

Adding more complexity

Making Better Use of Existing Refined Data

Building New Pipelines with Newly Ingested Raw Data

Making Trips to the Data Lake a Tradition

Checking Your GPS: The Data Lake Road Map

Getting an Overhead View of the Road to the Data Lake

Assessing Your Current State of Data and Analytics

Snorkeling through your enterprise analytics

Scoring your analytics continuum

Grading your breadth of data usage

Writing data-driven prescriptions

Receiving your final grades

Diving deep into your data architecture and governance

Scoring your analytical data landscape

Checking off the rules and regulations

Tallying up the score

Putting Together a Lofty Vision

Hot off the presses, straight from the lake: Writing a press release

Designing a slick sales brochure

Polishing the lenses of your data lake vision

Building Your Data Lake Architecture

Conceptual architecture

Implementation architecture

Deciding on Your Kickoff Activities

Expanding Your Data Lake

Booking Future Trips to the Data Lake

Searching for the All-in-One Data Lake

ACID EATS AWAY AT YOUR DATA CHALLENGES

Spreading Artificial Intelligence Smarts throughout Your Data Lake

Lining up your data

Shining a light into your analytics innards

Playing traffic cop

The Part of Tens

Top Ten Reasons to Invest in Building a Data Lake

Supporting the Entire Analytics Continuum

Bringing Order to Your Analytical Data throughout Your Enterprise

Retiring Aging Data Marts

Bringing Unfulfilled Analytics Ideas out of Dry Dock

Laying a Foundation for Future Analytics

Providing a Region for Experimentation

Improving Your Master Data Efforts

Opening Up New Business Possibilities

Keeping Up with the Competition

Getting Your Organization Ready for the Next Big Thing

Ten Places to Get Help for Your Data Lake

Cloud Provider Professional Services

Major Systems Integrators

Smaller Systems Integrators

Individual Consultants

Training Your Internal Staff

Industry Analysts

Data Lake Bloggers

Data Lake Groups and Forums

Data-Oriented Associations

Academic Resources

Ten Differences between a Data Warehouse and a Data Lake

Types of Data Supported

Data Volumes

Different Internal Data Models

Architecture and Topology

ETL versus ELT

Data Latency

Analytical Uses

Incorporating New Data Sources

User Communities

Hosting

Index. A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

Y

Z

About the Author

Dedication

Author’s Acknowledgments

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

In December 1995, I wrote an article for Database Programming & Design magazine entitled “I Want a Data Warehouse, So What Is It Again?” A few months later, I began writing Data Warehousing For Dummies (Wiley), building on the article’s content to help readers make sense of first-generation data warehousing.

Fast-forward a quarter of a century, and I could very easily write an article entitled “I Want a Data Lake, So What Is It Again?” This time, I’m cutting right to the chase with Data Lakes For Dummies. To quote a famous former baseball player named Yogi Berra, it’s déjà vu all over again!

.....

The operators of the resort could’ve said, “What the heck, let’s just have a free-for-all out on the lake and hope for the best.” Instead, they wisely established different zones for different purposes, resulting in orderly, peaceful vacations (hopefully!) rather than chaos.

A data lake is also divided into different zones. The exact number of zones may vary from one organization’s data lake to another’s, but you’ll always find at least three zones in use — bronze, silver, and gold — and sometimes a fourth zone, the sandbox.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Data Lakes For Dummies
Подняться наверх