Official Google Cloud Certified Professional Data Engineer Study Guide

Official Google Cloud Certified Professional Data Engineer Study Guide
Автор книги: id книги: 1887745     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 4829,04 руб.     (47,08$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Зарубежная компьютерная литература Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119618454 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

The proven Study Guide that prepares you for this new Google Cloud exam The  Google Cloud Certified Professional Data Engineer Study Guide , provides everything you need to prepare for this important exam and master the skills necessary to land that coveted Google Cloud Professional Data Engineer certification. Beginning with a pre-book assessment quiz to evaluate what you know before you begin, each chapter features exam objectives and review questions, plus the online learning environment includes additional complete practice tests.  Written by Dan Sullivan, a popular and experienced online course author for machine learning, big data, and Cloud topics,  Google Cloud Certified Professional Data Engineer Study Guide is your ace in the hole for deploying and managing analytics and machine learning applications.  • Build and operationalize storage systems, pipelines, and compute infrastructure • Understand machine learning models and learn how to select pre-built models • Monitor and troubleshoot machine learning models • Design analytics and machine learning applications that are secure, scalable, and highly available.  This exam guide is designed to help you develop an in depth understanding of data engineering and machine learning on Google Cloud Platform.

Оглавление

Dan Sullivan. Official Google Cloud Certified Professional Data Engineer Study Guide

Official Google Cloud Certified Professional Data Engineer. Study Guide

Acknowledgments

About the Author

About the Technical Editor

CONTENTS

List of Tables

List of Illustrations

Guide

Pages

Introduction

What Does This Book Cover?

Interactive Online Learning Environment and TestBank

Additional Resources

Objective Map

Assessment Test

Answers to Assessment Test

Chapter 1 Selecting Appropriate Storage Technologies

From Business Requirements to Storage Systems

Ingest

Application Data

Streaming Data

Batch Data

Store

Data Access Patterns

Access Controls

Time to Store

Process and Analyze

Data Transformations

Data Analysis

Explore and Visualize

Technical Aspects of Data: Volume, Velocity, Variation, Access, and Security

Volume

Velocity

Variation in Structure

Data Access Patterns

Security Requirements

Types of Structure: Structured, Semi-Structured, and Unstructured

Structured: Transactional vs. Analytical

Semi-Structured: Fully Indexed vs. Row Key Access

Fully Indexed, Semi-Structured Data

Row Key Access

Unstructured Data

Google’s Storage Decision Tree

Schema Design Considerations

Relational Database Design

OLTP

OLAP

NoSQL Database Design

Key-Value Data Stores

Document Databases

Wide-Column Databases

Graph Databases

Exam Essentials

Review Questions

Chapter 2 Building and Operationalizing Storage Systems

Cloud SQL

Configuring Cloud SQL

Improving Read Performance with Read Replicas

Importing and Exporting Data

Cloud Spanner

Configuring Cloud Spanner

Replication in Cloud Spanner

Database Design Considerations

Importing and Exporting Data

Cloud Bigtable

Configuring Bigtable

Database Design Considerations

Importing and Exporting

Cloud Firestore

Cloud Firestore Data Model

Indexing and Querying

Importing and Exporting

BigQuery

BigQuery Datasets

Loading and Exporting Data

Clustering, Partitioning, and Sharding Tables

Streaming Inserts

Monitoring and Logging in BigQuery

BigQuery Cost Considerations

Tips for Optimizing BigQuery

Cloud Memorystore

Cloud Storage

Organizing Objects in a Namespace

Storage Tiers

Cloud Storage Use Cases

Data Retention and Lifecycle Management

Unmanaged Databases

Exam Essentials

Review Questions

Chapter 3 Designing Data Pipelines

Overview of Data Pipelines

Data Pipeline Stages

Ingestion

Transformation

Storage

Analysis

Types of Data Pipelines

Data Warehousing Pipelines

Extract, Transformation, and Load

Extract, Load, and Transformation

Extraction and Load

Change Data Capture

Stream Processing Pipelines

Event Time and Processing Time

Sliding and Tumbling Windows

Late Arriving and Watermarks

Hot Path and Cold Path Ingestion

Machine Learning Pipelines

GCP Pipeline Components

Cloud Pub/Sub

Working with Messaging Queues

Open Source Alternative: Kafka

Cloud Dataflow

Cloud Dataflow Concepts

Jobs and Templates

Cloud Dataproc

Managing Data in Cloud Dataproc

Configuring a Cloud Dataproc Cluster

Submitting a Job

Cloud Composer

Migrating Hadoop and Spark to GCP

Exam Essentials

Review Questions

Chapter 4 Designing a Data Processing Solution

Designing Infrastructure

Choosing Infrastructure

Compute Engine

Kubernetes Engine

App Engine

Cloud Functions

Availability, Reliability, and Scalability of Infrastructure

Making Compute Resources Available, Reliable, and Scalable

Compute Engine

Kubernetes Engine

App Engine and Cloud Functions

Making Storage Resources Available, Reliable, and Scalable

Making Network Resources Available, Reliable, and Scalable

Hybrid Cloud and Edge Computing

Analytics Hybrid Cloud

Edge Cloud

Designing for Distributed Processing

Distributed Processing: Messaging

Message Brokers

Message Queues

Event Processing Models

Distributed Processing: Services

Service-Oriented Architectures

Microservices

Serverless Functions

Migrating a Data Warehouse

Assessing the Current State of a Data Warehouse

Technical Requirements

Business Benefits

Designing the Future State of a Data Warehouse

Migrating Data, Jobs, and Access Controls

Validating the Data Warehouse

Exam Essentials

Review Questions

Chapter 5 Building and Operationalizing Processing Infrastructure

Provisioning and Adjusting Processing Resources

Provisioning and Adjusting Compute Engine

Provisioning Single VM Instances

Provisioning Managed Instance Groups

Adjusting Compute Engine Resources to Meet Demand

Provisioning and Adjusting Kubernetes Engine

Overview of Kubernetes Architecture

Provisioning a Kubernetes Engine Cluster

Adjusting Kubernetes Engine Resources to Meet Demand

Autoscaling Applications in Kubernetes Engine

Autoscaling Clusters in Kubernetes Engine

Kubernetes YAML Configurations

Provisioning and Adjusting Cloud Bigtable

Provisioning Bigtable Instances

Replication in Bigtable

Provisioning and Adjusting Cloud Dataproc

Configuring Cloud Dataflow

Configuring Managed Serverless Processing Services

Configuring App Engine

Configuring Cloud Functions

Monitoring Processing Resources

Stackdriver Monitoring

Stackdriver Logging

Stackdriver Trace

Exam Essentials

Review Questions

Chapter 6 Designing for Security and Compliance

Identity and Access Management with Cloud IAM

Predefined Roles

Custom Roles

Using Roles with Service Accounts

Access Control with Policies

Using IAM with Storage and Processing Services

Cloud Storage and IAM

Cloud Bigtable and IAM

BigQuery and IAM

Cloud Dataflow and IAM

Data Security

Encryption

Encryption at Rest

Encryption in Transit

Key Management

Default Key Management

Customer-Managed Encryption Keys

Customer-Supplied Encryption Keys

Ensuring Privacy with the Data Loss Prevention API

Detecting Sensitive Data

Running Data Loss Prevention Jobs

Inspection Best Practices

Legal Compliance

Health Insurance Portability and Accountability Act (HIPAA)

Children’s Online Privacy Protection Act

FedRAMP

General Data Protection Regulation

Exam Essentials

Review Questions

Chapter 7 Designing Databases for Reliability, Scalability, and Availability

Designing Cloud Bigtable Databases for Scalability and Reliability

Data Modeling with Cloud Bigtable

Designing Row-keys

Row-key Design Best Practices

Antipatterns for Row-key Design

Key Visualizer

Designing for Time Series

Use Replication for Availability and Scalability

Designing Cloud Spanner Databases for Scalability and Reliability

Relational Database Features

Interleaved Tables

Primary Keys and Hotspots

Database Splits

Secondary Indexes

Query Best Practices

Use Query Parameters

Use EXPLAIN PLAN to Understand Execution Plans

Avoid Long Locks

Designing BigQuery Databases for Data Warehousing

Schema Design for Data Warehousing

Types of Analytical Datastores

Projects, Datasets, and Tables

Clustered and Partitioned Tables

Partitioning

Clustering

Querying Data in BigQuery

External Data Access

Querying Cloud Bigtable Data from BigQuery

Querying Cloud Storage Data from BigQuery

Querying Google Drive Data from BigQuery

BigQuery ML

Exam Essentials

Review Questions

Chapter 8 Understanding Data Operations for Flexibility and Portability

Cataloging and Discovery with Data Catalog

Searching in Data Catalog

Tagging in Data Catalog

Data Preprocessing with Dataprep

Cleansing Data

Discovering Data

Enriching Data

Importing and Exporting Data

Structuring and Validating Data

Visualizing with Data Studio

Connecting to Data Sources

Visualizing Data

Sharing Data

Exploring Data with Cloud Datalab

Jupyter Notebooks

Managing Cloud Datalab Instances

Adding Libraries to Cloud Datalab Instances

Orchestrating Workflows with Cloud Composer

Airflow Environments

Creating DAGs

Airflow Logs

Exam Essentials

Review Questions

Chapter 9 Deploying Machine Learning Pipelines

Structure of ML Pipelines

Data Ingestion

Batch Data Ingestion

Streaming Data Ingestion

Data Preparation

Data Exploration

Data Transformation

Feature Engineering

Data Segregation

Training Data

Validation Data

Test Data

Model Training

Feature Selection

Underfitting, Overfitting, and Regularization

Model Evaluation

Individual Evaluation Metrics

K-Fold Cross Validation

Confusion Matrices

Bias and Variance

Model Deployment

Model Monitoring

GCP Options for Deploying Machine Learning Pipeline

Cloud AutoML

BigQuery ML

Kubeflow

Spark Machine Learning

Exam Essentials

Review Questions

Chapter 10 Choosing Training and Serving Infrastructure

Hardware Accelerators

Graphics Processing Units

Tensor Processing Units

Choosing Between CPUs, GPUs, and TPUs

Distributed and Single Machine Infrastructure

Single Machine Model Training

Distributed Model Training

Serving Models

Edge Computing with GCP

Edge Computing Overview

Edge Computing Components and Processes

Edge TPU

Cloud IoT

Exam Essentials

Review Questions

Chapter 11 Measuring, Monitoring, and Troubleshooting Machine Learning Models

Three Types of Machine Learning Algorithms

Supervised Learning

Classification

Regression

Unsupervised Learning

Anomaly Detection

Reinforcement Learning

Deep Learning

Engineering Machine Learning Models

Model Training and Evaluation

Data Collection and Preparation

Feature Engineering

Training Models

Evaluating Models

Accuracy

Precision

Recall

F1 Score

Operationalizing ML Models

Deploying Models

Model Serving

Monitoring

Retraining

Common Sources of Error in Machine Learning Models

Data Quality

Unbalanced Training Sets

Types of Bias

Exam Essentials

Review Questions

Chapter 12 Leveraging Prebuilt Models as a Service

Sight

Vision AI

Video AI

Conversation

Dialogflow

Cloud Text-to-Speech API

Cloud Speech-to-Text API

Language

Translation

Natural Language

Structured Data

Recommendations AI API

Cloud Inference API

Exam Essentials

Review Questions

Appendix Answers to Review Questions. Chapter 1: Selecting Appropriate Storage Technologies

Chapter 2: Building and Operationalizing Storage Systems

Chapter 3: Designing Data Pipelines

Chapter 4: Designing a Data Processing Solution

Chapter 5: Building and Operationalizing Processing Infrastructure

Chapter 6: Designing for Security and Compliance

Chapter 7: Designing Databases for Reliability, Scalability, and Availability

Chapter 8: Understanding Data Operations for Flexibility and Portability

Chapter 9: Deploying Machine Learning Pipelines

Chapter 10: Choosing Training and Serving Infrastructure

Chapter 11: Measuring, Monitoring, and Troubleshooting Machine Learning Models

Chapter 12: Leveraging Prebuilt Models as a Service

Index. A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W–Z

Online Test Bank

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Dan Sullivan

Carole Jelen, vice president of Waterside Productions, and Jim Minatel, associate publisher at John Wiley & Sons, continue to lead the effort to create Google Cloud certification guides. It was a pleasure to work with Gary Schwartz, project editor, who managed the process that got us from outline to a finished manuscript. Thanks to Christine O’Connor, senior production editor, for making the last stages of book development go as smoothly as they did.

.....

Different storage systems will have different levels of access controls. Cloud Storage, for example, can have access controls at the bucket and the object level. If someone has access to a file in Cloud Storage, they will have access to all the data in that file. If some users have access only to a subset of a dataset, then the data could be stored in a relational database and a view could be created that includes only the data that the user is allowed to access.

Encrypting data at rest is an important requirement for many use cases; fortunately, all Google Cloud storage services encrypt data at rest.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Official Google Cloud Certified Professional Data Engineer Study Guide
Подняться наверх