The Informed Company

The Informed Company
Автор книги: id книги: 2186966     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 2174,48 руб.     (21,2$) Читать книгу Купить и скачать книгу Электронная книга Жанр: Маркетинг, PR, реклама Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119748014 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Learn how to manage a modern data stack and get the most out of data in your organization! Thanks to the emergence of new technologies and the explosion of data in recent years, we need new practices for managing and getting value out of data. In the modern, data driven competitive landscape the «best guess» approach—reading blog posts here and there and patching together data practices without any real visibility—is no longer going to hack it. The Informed Company provides definitive direction on how best to leverage the modern data stack, including cloud computing, columnar storage, cloud ETL tools, and cloud BI tools. You'll learn how to work with Agile methods and set up processes that's right for your company to use your data as a key weapon for your success . . . You'll discover best practices for every stage, from querying production databases at a small startup all the way to setting up data marts for different business lines of an enterprise. In their work at Chartio, authors Fowler and David have learned that most businesspeople are almost completely self-taught when it comes to data. If they are using resources, those resources are outdated, so they're missing out on the latest cloud technologies and advances in data analytics. This book will firm up your understanding of data and bring you into the present with knowledge around what works and what doesn't. Discover the data stack strategies that are working for today's successful small, medium, and enterprise companies Learn the different Agile stages of data organization, and the right one for your team Learn how to maintain Data Lakes and Data Warehouses for effective, accessible data storage Gain the knowledge you need to architect Data Warehouses and Data Marts Understand your business's level of data sophistication and the steps you can take to get to «level up» your data The Informed Company is the definitive data book for anyone who wants to work faster and more nimbly, armed with actionable decision-making data.

Оглавление

Dave Fowler. The Informed Company

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

The Informed Company. How to Build Modern Agile Data Stacks that Drive Winning Insights

About This Book. Why Write This Book

Who This Book Is For

Who This Book Is Not For

Who Wrote the Book

Who Edited the Book

Influences

How This Book Was Written

How to Read This Book

Foreword

Introduction

Merging Business Context with Data Information

The Four Stages of Agile Data Organization

Stage 1. Sources

Stage 2. Lake

Stage 3. Warehouse (Single Source of Truth)

Warehouse

Stage 4. Marts

Mart

STAGE 1 SOURCE aka Siloed Data

Source Stage Overview

Chapter One Starting with Source Data

Common Options for Analyzing Source Data

Get to Know Data with Application Dashboards

Ask Interesting Questions with Excel

Query Databases with an SQLIDE

Visualize Data in the Office with Dashboards

Toying Around with Analytics Products

Look at Data Through a Business Intelligence Tool

Chapter Two The Need to Replicate Source Data

Replicate Sources

Create Read‐Only Access

Chapter Three Source Data Best Practices

Keep a Complexity Wiki Page

Snippet Dictionary

Use a BI Product

Double Check Results

Keep Short Dashboards

Design Before Building

STAGE 2 DATA LAKE aka Data Combined

Lake Stage Overview

Chapter Four Why Build a Data Lake?

What Is a Data Lake?

Reasons to Build a Data Lake Summarized. One Technology

JOINing

Robust Query Writing

Performance

Future Development

Chapter Five Choosing an Engine for the Data Lake

Modern Columnar Warehouse Engines

Columnar Storage Data Lakes

Deciding Factors

Modern Warehouse Engine Products

Amazon Redshift. Pros

Cons

Google BigQuery. Pros

Cons

Snowflake. Pros

Cons

Database Engines

PostgreSQL. Pros

Cons

Recommendation

Chapter Six Extract and Load (EL) Data

ETL versus ELT

EL/ETL Vendors

Extract Options

Complete Extraction

Incremental Extraction

Load Options

Push Changes

Store Separate

Multiple Schemas

Other Extract and Load Routes. Traditional ETL

DIY

Chapter Seven Data Lake Security

Access in Central Place

Permission Tiers

Chapter Eight Data Lake Maintenance

Why SQL?

Data Sources

Adding New Data Sources

Data Source Updates

Fixing Broken Connections

Performance

Optimize Individual Queries

Caching

Create Limits

Scheduling

Upgrade Snippets to Views

STAGE 3 DATA WAREHOUSE aka the Single Source of Truth

Warehouse Stage Overview

ZenHubInterAppForceDB

Agile Warehouse Building with Layers

Chapter Nine The Power of Layers and Views

Make Readable Views

Layer Views on Views

Start with a Single View

Chapter Ten Staging Schemas

Orient to the Schemas

Pick a Table and Clean It

Eliminate Columns

Ensure Table and Column Names Are Understandable

Ensure Values Are Readable

Standardize Values

Extract Fields from Structured Columns (JSON)

Extract Data Using Wildcards

Decide What to Do about Null Fields

Filter Unwanted Rows

Put It All Together

Other Staging Modeling Considerations. NULLs

Ways of Replacing Nulls. String Values

Numbers

Dates

When Not to Replace

Date and Time

SQL Comments

Run the Model

Building on Top of Staging Schemas

Chapter Eleven Model Data with dbt

Version Control

Modularity and Reusability

Package Management

Organizing Files

Macros

Incremental Tables

Testing

Uniqueness

Not Null

Relationships

Accepted Values

Chapter Twelve Deploy Modeling Code

Branch Using Version Control Software

Commit Message

Test Locally

Code Review

Schedule Runs

Chapter Thirteen Implementing the Data Warehouse

Manage Dependencies

Combine Tables Within Schemas

Combine Tables Across Schemas

Keep the Grain Consistent

Create Business Metrics

Adding Calculated Fields to Tables

Avoid Aggregated Metric Tables

Create a Standard Metrics Dashboard

Keeping Accurate History

Snapshots

Keep a Data Log

Chapter Fourteen Managing Data Access

How to Secure Sensitive Data in the Data Warehouse

Read‐Only

Custom User Groups

Hash Sensitive Data

Pseudo‐Anonymize Data

How to Secure Sensitive Data in a BI Tool

Consistent Account Audit/Cleanup

Chapter Fifteen Maintaining the Source of Truth

Track New Metrics

Three Considerations for Adding New Calculated Fields

1. Where It Should Live

2. How to Backfill It

3. What to Call It

Deprecate Old Metrics

Explicitly Deprecate Data

Staging Schemas Make Source Changes Easy

Deprecate Old Schemas

Resolve Conflicting Numbers

Handling Ongoing Requests and Ongoing Feedback

Updating Modeling Code

Manage Access

How to Handle Permissions Correctly

Tuning to Optimize

Add Clusters

Materialize Expensive Views

Code Review All Modeling

Maintenance Checklist

STAGE 4 DATA MARTS aka Data Democratized

Mart Stage Overview

Stage 4: Data Marts

Overview of Building Data Marts

Center Around Business Use Cases

Chapter Sixteen Data Mart Implementation

Views on the Data Warehouse

Segment Tables

Access Update

Chapter Seventeen Data Mart Maintenance

Educate Team

Identifies Issues

Spikes

No Spikes

No New Data

All Queries on a Data Source Are Erroring Out

Slow Queries

Identify New Needs

Help Track Success

Chapter Eighteen Modern versus Traditional Data Stacks: What's Changed? What's Changed?

Storage and Compute Continues to Drop in Price Rapidly

Data Lakes Were Added to the Stack and ELT Replaced ETL

Data Is Increasingly Democratized

Chapter Nineteen Row‐ versus Column‐Oriented Database

Row‐Oriented Databases

Writing to Row Store Databases

Reading from Row Store Databases

Extra Data into Memory

Column‐Oriented Databases

Writing to a Column Store Databases

Reading from a Column Store Database

Coding the Data into More Compact Forms

Compressing the Data

Ordering the Data

Summary

Chapter Twenty Style Guide Example

Simplify

Only Include Fields That Have an Apparent Analytical Purpose

Extract Relevant Data from Complex Data Tpes

Change Flags and Cryptic Abbreviations to Meaningful Values

Denormalize Where Beneficial

Clean

Attempt to Eliminate NULLs

Fix Common Data Inconsistencies

Remove Irrelevant Data

Standardize Data Types

Naming Conventions

Plural Table Names

Singular Column Names

id as Primary Key

Lowercase, Underscored Naming

Share It

Chapter Twenty-One Building an SST Example

First Attempt—Same Tables with Prefixes

What Tables Did We Keep?

What Tables Did We Not Include?

What Columns Did We Keep?

What Was Not Ideal About It?

What Was the Feedback?

Second Attempt—Operational Schema (Source Agnostic)

What Tables Did We Keep?

How Did We Make These Decisions?

What Was Not Ideal About It?

What Was the Feedback?

Third Attempt—Application Separate, Other Sources Smashed

What Tables Did We Keep?

What Columns Did We Keep?

What Was the Feedback?

Less Planning, More Implementing

Acknowledgments and Contributions

Thank‐yous

Index

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Dave FowlerMatt David

— Dave Fowler

.....

This book aims to provide a broad overview and general guidelines on how to set up a data stack. We intentionally gloss over the details of launching a Redshift instance, writing SQL, or using various BI products. That would clutter the text, repeat what's already on the internet, and make the read quite stale.

The book starts with a quick overview and decision charts about what the stages are and what stage is appropriate for you. This book is structured with a section for each of the four stages, and if you'd like, you can jump ahead to the stage you're at.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу The Informed Company
Подняться наверх