Data Science

Data Science
Автор книги: id книги: 1880020     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 7240,73 руб.     (66,08$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Математика Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119544173 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Tap into the power of data science with this comprehensive resource for non-technical professionals Data Science: The Executive Summary – A Technical Book for Non-Technical Professionals  is a comprehensive resource for people in non-engineer roles who want to fully understand data science and analytics concepts. Accomplished data scientist and author Field Cady describes both the “business side” of data science, including what problems it solves and how it fits into an organization, and the technical side, including analytical techniques and key technologies.  Data Science: The Executive Summary covers topics like:  Assessing whether your organization needs data scientists, and what to look for when hiring them When Big Data is the best approach to use for a project, and when it actually ties analysts’ hands Cutting edge Artificial Intelligence, as well as classical approaches that work better for many problems How many techniques rely on dubious mathematical idealizations, and when you can work around them Perfect for executives who make critical decisions based on data science and analytics, as well as mangers who hire and assess the work of data scientists, Data Science: The Executive Summary also belongs on the bookshelves of salespeople and marketers who need to explain what a data analytics product does. Finally, data scientists themselves will improve their technical work with insights into the goals and constraints of the business situation.  

Оглавление

Field Cady. Data Science

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Data Science: The Executive Summary. A Technical Book for Non-Technical Professionals

Copyright

1 Introduction. 1.1 Why Managers Need to Know About Data Science

1.2 The New Age of Data Literacy

1.3 Data‐Driven Development

1.4 How to Use this Book

2 The Business Side of Data Science

2.1 What Is Data Science?

2.1.1 What Data Scientists Do

2.1.2 History of Data Science

2.1.3 Data Science Roadmap

2.1.4 Demystifying the Terms: Data Science, Machine Learning, Statistics, and Business Intelligence

2.1.4.1 Machine Learning

2.1.4.2 Statistics

2.1.4.3 Business Intelligence

2.1.5 What Data Scientists Don't (Necessarily) Do

2.1.5.1 Working Without Data

2.1.5.2 Working with Data that Can't Be Interpreted

2.1.5.3 Replacing Subject Matter Experts

2.1.5.4 Designing Mathematical Algorithms

2.2 Data Science in an Organization

2.2.1 Types of Value Added

2.2.1.1 Business Insights

2.2.1.2 Intelligent Products

2.2.1.3 Building Analytics Frameworks

2.2.1.4 Offline Batch Analytics

2.2.2 One‐Person Shops and Data Science Teams

2.2.3 Related Job Roles

2.2.3.1 Data Engineer

2.2.3.2 Data Analyst

2.2.3.3 Software Engineer

2.3 Hiring Data Scientists

2.3.1 Do I Even Need Data Science?

2.3.2 The Simplest Option: Citizen Data Scientists

2.3.3 The Harder Option: Dedicated Data Scientists

2.3.4 Programming, Algorithmic Thinking, and Code Quality

2.3.5 Hiring Checklist

2.3.6 Data Science Salaries

2.3.7 Bad Hires and Red Flags

2.3.8 Advice with Data Science Consultants

2.4 Management Failure Cases

2.4.1 Using Them as Devs

2.4.2 Inadequate Data

2.4.3 Using Them as Graph Monkeys

2.4.4 Nebulous Questions

2.4.5 Laundry Lists of Questions Without Prioritization

Glossary

3 Working with Modern Data

3.1 Unstructured Data and Passive Collection

3.2 Data Types and Sources

3.3 Data Formats

3.3.1 CSV Files

3.3.2 JSON Files

3.3.3 XML and HTML

3.4 Databases

3.4.1 Relational Databases and Document Stores

3.4.2 Database Operations

3.5 Data Analytics Software Architectures

3.5.1 Shared Storage

3.5.2 Shared Relational Database

3.5.3 Document Store + Analytics RDB

3.5.4 Storage + Parallel Processing

Glossary

Notes

4 Telling the Story, Summarizing Data

4.1 Choosing What to Measure

4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture Is Worth a Thousand Numbers

4.3 Experiments, Correlation, and Causality

4.4 Summarizing One Number

4.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails. 4.5.1 Measuring Central Tendency

4.5.1.1 Mean

4.5.1.2 Median

4.5.1.3 Mode

4.5.2 Measuring Spread

4.5.2.1 Standard Deviation

4.5.2.2 Percentiles

4.5.3 Advanced Material: Managing Heavy Tails

4.6 Summarizing Two Numbers: Correlations and Scatterplots

4.6.1 Correlations

4.6.1.1 Pearson Correlation

4.6.1.2 Ordinal Correlations

4.6.2 Mutual Information

4.7 Advanced Material: Fitting a Line or Curve

4.7.1 Effects of Outliers

4.7.2 Optimization and Choosing Cost Functions

4.8 Statistics: How to Not Fool Yourself

4.8.1 The Central Concept: The p‐Value

4.8.2 Reality Check: Picking a Null Hypothesis and Modeling Assumptions

4.8.3 Advanced Material: Parameter Estimation and Confidence Intervals

4.8.4 Advanced Material: Statistical Tests Worth Knowing

4.8.4.1 χ2‐Test

4.8.4.2 T‐test

4.8.4.3 Fisher's Exact Test

4.8.4.4 Multiple Hypothesis Testing

4.8.5 Bayesian Statistics

4.9 Advanced Material: Probability Distributions Worth Knowing

4.9.1 Probability Distributions: Discrete and Continuous

4.9.2 Flipping Coins: Bernoulli Distribution

4.9.3 Adding Coin Flips: Binomial Distribution

4.9.4 Throwing Darts: Uniform Distribution

4.9.5 Bell‐Shaped Curves: Normal Distribution

4.9.6 Heavy Tails 101: Log‐Normal Distribution

4.9.7 Waiting Around: Exponential Distribution and the Geometric Distribution

4.9.8 Time to Failure: Weibull Distribution

4.9.9 Counting Events: Poisson Distribution

Glossary

5 Machine Learning

5.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers

5.1.1 Reality Check: Getting Labeled Data and Assuming Independence

5.1.2 Feature Extraction and the Limitations of Machine Learning

5.1.3 Overfitting

5.1.4 Cross‐Validation Strategies

5.2 Measuring Performance

5.2.1 Confusion Matrices

5.2.2 ROC Curves

5.2.3 Area Under the ROC Curve

5.2.4 Selecting Classification Cutoffs

5.2.5 Other Performance Metrics

5.2.6 Lift Curves

5.3 Advanced Material: Important Classifiers

5.3.1 Decision Trees

5.3.2 Random Forests

5.3.3 Ensemble Classifiers

5.3.4 Support Vector Machines

5.3.5 Logistic Regression

5.3.6 Lasso Regression

5.3.7 Naive Bayes

5.3.8 Neural Nets

5.4 Structure of the Data: Unsupervised Learning

5.4.1 The Curse of Dimensionality

5.4.2 Principal Component Analysis and Factor Analysis

5.4.2.1 Scree Plots and Understanding Dimensionality

5.4.2.2 Factor Analysis

5.4.2.3 Limitations of PCA

5.4.3 Clustering

5.4.3.1 Real‐World Assessment of Clusters

5.4.3.2 k‐means Clustering

5.4.3.3 Advanced Material: Other Clustering Algorithms. Gaussian Mixture Models

Agglomerative Clustering

5.4.3.4 Advanced Material: Evaluating Cluster Quality

SiIhouette Score

Rand Index and Adjusted Rand Index

Mutual Information

5.5 Learning as You Go: Reinforcement Learning

5.5.1 Multi‐Armed Bandits and ε‐Greedy Algorithms

5.5.2 Markov Decision Processes and Q‐Learning

Glossary

6 Knowing the Tools

6.1 A Note on Learning to Code

6.2 Cheat Sheet

6.3 Parts of the Data Science Ecosystem

6.3.1 Scripting Languages

6.3.2 Technical Computing Languages

6.3.2.1 Python's Technical Computing Stack

6.3.2.2 R

6.3.2.3 Matlab and Octave

6.3.2.4 Mathematica

6.3.2.5 SAS

6.3.2.6 Julia

6.3.3 Visualization

6.3.3.1 Tableau

6.3.3.2 Excel

6.3.3.3 D3.js

6.3.4 Databases

6.3.5 Big Data

6.3.5.1 Types of Big Data Technologies

6.3.5.2 Spark

6.3.6 Advanced Material: The Map‐Reduce Paradigm

6.4 Advanced Material: Database Query Crash Course

6.4.1 Basic Queries

6.4.2 Groups and Aggregations

6.4.3 Joins

6.4.4 Nesting Queries

Glossary

7 Deep Learning and Artificial Intelligence

7.1 Overview of AI. 7.1.1 Don't Fear the Skynet: Strong and Weak AI

7.1.2 System 1 and System 2

7.2 Neural Networks. 7.2.1 What Neural Nets Can and Can't Do

7.2.2 Enough Boilerplate: What's a Neural Net?

7.2.3 Convolutional Neural Nets

7.2.4 Advanced Material: Training Neural Networks

7.2.4.1 Manual Versus Automatic Feature Extraction

7.2.4.2 Dataset Sizes and Data Augmentation

7.2.4.3 Batches and Epochs

7.2.4.4 Transfer Learning

7.2.4.5 Feature Extraction

7.2.4.6 Word Embeddings

7.3 Natural Language Processing

7.3.1 The Great Divide: Language Versus Statistics

7.3.2 Save Yourself Some Trouble: Consider Regular Expressions

7.3.3 Software and Datasets

7.3.4 Key Issue: Vectorization

7.3.5 Bag‐of‐Words

7.4 Knowledge Bases and Graphs

Glossary

Postscript

Index

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Field Cady

And for my son Cyrus, who entered shortly thereafter.

.....

So where is all of this leading? Cutting out hyperbole and speculation, what does it look like for an organization to make full use of modern data technologies and what are the benefits? The goal that we are pushing toward is what I call “data‐driven development” (DDD). In an organization that uses DDD, all stages in a business process have their data gathered, modeled, and deployed to enable better decision making. Overall business goals and workflows are crafted by human experts, but after that every part of the system can be monitored and optimized, hypotheses can be tested rigorously and retroactively, and large‐scale trends can be identified and capitalized on. Data greases the wheels of all parts of the operation and provides a constant pulse on what's happening on the ground.

I break the benefits of DDD into three major categories:

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Data Science
Подняться наверх