Becoming a Data Head
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Alex J. Gutman. Becoming a Data Head
PRAISE FOR BECOMING A DATA HEAD
Becoming a Data Head. How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
About the Authors
About the Technical Editors
Acknowledgments
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Foreword
NOTE
Introduction
THE DATA SCIENCE INDUSTRIAL COMPLEX
WHY WE CARE
The Subprime Mortgage Crises
The 2016 United States General Election
Our Hypothesis
DATA IN THE WORKPLACE
The Boardroom Scene
YOU CAN UNDERSTAND THE BIG PICTURE
Classifying Restaurants
So What?
WHO THIS BOOK IS WRITTEN FOR
WHY WE WROTE THIS BOOK
WHAT YOU'LL LEARN
HOW THIS BOOK IS ORGANIZED
ONE LAST THING BEFORE WE BEGIN
NOTES
PART I Thinking Like a Data Head
CHAPTER 1 What Is the Problem?
QUESTIONS A DATA HEAD SHOULD ASK
Why Is This Problem Important?
Who Does This Problem Affect?
What If We Don't Have the Right Data?
When Is the Project Over?
What If We Don't Like the Results?
UNDERSTANDING WHY DATA PROJECTS FAIL
Customer Perception
Discussion
WORKING ON PROBLEMS THAT MATTER
CHAPTER SUMMARY
NOTES
CHAPTER 2 What Is Data?
DATA VS. INFORMATION
An Example Dataset
Know Your Audience
DATA TYPES
HOW DATA IS COLLECTED AND STRUCTURED
Observational vs. Experimental Data
Structured vs. Unstructured Data
Is Data One or Many?
BASIC SUMMARY STATISTICS
CHAPTER SUMMARY
NOTES
CHAPTER 3 Prepare to Think Statistically
ASK QUESTIONS
Comment on “Statistical Thinking”
THERE IS VARIATION IN ALL THINGS
Scenario: Customer Perception (The Sequel)5
Case Study: Kidney-Cancer Rates
PROBABILITIES AND STATISTICS
Probability vs. Intuition
Discovery with Statistics
Statistical Thinking Resources
CHAPTER SUMMARY
NOTES
PART II Speaking Like a Data Head
CHAPTER 4 Argue with the Data
WHAT WOULD YOU DO?
Missing Data Disaster
NOTE
Alex's Comment on the Challenger Data
TELL ME THE DATA ORIGIN STORY
Who Collected the Data?
How Was the Data Collected?
IS THE DATA REPRESENTATIVE?
Is There Sampling Bias?
What Did You Do with Outliers?
WHAT DATA AM I NOT SEEING?
How Did You Deal with Missing Values?
Can the Data Measure What You Want It to Measure?
ARGUE WITH DATA OF ALL SIZES
CHAPTER SUMMARY
NOTES
CHAPTER 5 Explore the Data
EXPLORATORY DATA ANALYSIS AND YOU
Are You a Manager or Leader?
EMBRACING THE EXPLORATORY MINDSET
Questions to Guide You
The Setup
CAN THE DATA ANSWER THE QUESTION?
Set Expectations and Use Common Sense
Do the Values Make Intuitive Sense?
Data Visualization Refresher
Watch Out: Outliers and Missing Values
DID YOU DISCOVER ANY RELATIONSHIPS?
Understanding Correlation
Watch Out: Misinterpreting Correlation
Not Correlated but Still Interesting
Watch Out: Correlation Does Not Imply Causation
Smoking and Lung Cancer
DID YOU FIND NEW OPPORTUNITIES IN THE DATA?
CHAPTER SUMMARY
NOTES
CHAPTER 6 Examine the Probabilities
TAKE A GUESS
THE RULES OF THE GAME
Notation
Using “==” Instead of “=”
Conditional Probability and Independent Events
The Probability of Multiple Events
Two Things That Happen Together
One Thing or the Other
Remember the Overlap
PROBABILITY THOUGHT EXERCISE
Next Steps
BE CAREFUL ASSUMING INDEPENDENCE
Don't Fall for the Gambler's Fallacy
ALL PROBABILITIES ARE CONDITIONAL
Don't Swap Dependencies
Bayes' Theorem
ENSURE THE PROBABILITIES HAVE MEANING
Calibration
Rare Events Can, and Do, Happen
Do Not Needlessly Multiply Probabilities
CHAPTER SUMMARY
NOTES
CHAPTER 7 Challenge the Statistics
QUICK LESSONS ON INFERENCE
Give Yourself Some Wiggle Room
More Data, More Evidence
Challenge the Status Quo
Evidence to the Contrary
Balance Decision Errors
THE PROCESS OF STATISTICAL INFERENCE
THE QUESTIONS YOU SHOULD ASK TO CHALLENGE THE STATISTICS
What Is the Context for These Statistics?
What Is the Sample Size?
What Are You Testing?
What Is the Null Hypothesis?
Assuming Equivalence
What Is the Significance Level?
How Many Tests Are You Doing?
Can I See the Confidence Intervals?
Is This Practically Significant?
Are You Assuming Causality?
CHAPTER SUMMARY
NOTES
PART III Understanding the Data Scientist's Toolbox
CHAPTER 8 Search for Hidden Groups
UNSUPERVISED LEARNING
DIMENSIONALITY REDUCTION
Creating Composite Features
PRINCIPAL COMPONENT ANALYSIS
Principal Components in Athletic Ability
PCA Summary
Potential Traps
CLUSTERING
K-MEANS CLUSTERING
Clustering Retail Locations
Potential Traps
Hierarchical Clustering
CHAPTER SUMMARY
NOTES
CHAPTER 9 Understand the Regression Model
SUPERVISED LEARNING
LINEAR REGRESSION: WHAT IT DOES
Least Squares Regression: Not Just a Clever Name
LINEAR REGRESSION: WHAT IT GIVES YOU
Extending to Many Features
LINEAR REGRESSION: WHAT CONFUSION IT CAUSES
Omitted Variables
Multicollinearity
Data Leakage
Extrapolation Failures
Many Relationships Aren't Linear
Are You Explaining or Predicting?
Regression Performance
OTHER REGRESSION MODELS
CHAPTER SUMMARY
NOTES
CHAPTER 10 Understand the Classification Model
INTRODUCTION TO CLASSIFICATION
What You'll Learn
Classification Problem Setup
LOGISTIC REGRESSION
Logistic Regression: So What?
What to Watch Out for When Working with Logistic Regression
DECISION TREES
ENSEMBLE METHODS
Random Forests
Gradient Boosted Trees
Interpretability of Ensemble Models
WATCH OUT FOR PITFALLS
Misapplication of the Problem
Data Leakage
Not Splitting Your Data
Choosing the Right Decision Threshold
MISUNDERSTANDING ACCURACY
Confusion Matrices
Confusing Terms for Confusion Matrices
CHAPTER SUMMARY
NOTES
CHAPTER 11 Understand Text Analytics
EXPECTATIONS OF TEXT ANALYTICS
HOW TEXT BECOMES NUMBERS
A Big Bag of Words
Quick Thoughts on Word Clouds
N-Grams
Word Embeddings
TOPIC MODELING
TEXT CLASSIFICATION
Naïve Bayes
A Deeper Look
Sentiment Analysis
What About Tree-Based Methods on Text?
PRACTICAL CONSIDERATIONS WHEN WORKING WITH TEXT
Big Tech Has the Upper Hand
CHAPTER SUMMARY
NOTES
CHAPTER 12 Conceptualize Deep Learning
NEURAL NETWORKS
How Are Neural Networks Like the Brain?
A Simple Neural Network
How a Neural Network Learns
A Slightly More Complex Neural Network
APPLICATIONS OF DEEP LEARNING
The Benefits of Deep Learning
How Computers “See” Images
Convolutional Neural Networks
Deep Learning on Language and Sequences
DEEP LEARNING IN PRACTICE
Do You Have Data?
Transfer Learning (or How to Work with Small Datasets)
Is Your Data Structured?
What Will the Network Look Like?
Deep Learning for Practitioners
ARTIFICIAL INTELLIGENCE AND YOU
Big Tech Has the Upper Hand
Ethics in Deep Learning
CHAPTER SUMMARY
NOTES
PART IV Ensuring Success
CHAPTER 13 Watch Out for Pitfalls
BIASES AND WEIRD PHENOMENA IN DATA
Survivorship Bias
Regression to the Mean
Simpson's Paradox
Confirmation Bias
Effort Bias (aka the “Sunk Cost Fallacy”)
Algorithmic Bias
Uncategorized Bias
THE BIG LIST OF PITFALLS
Statistical and Machine Learning Pitfalls
Project Pitfalls
CHAPTER SUMMARY
NOTES
CHAPTER 14 Know the People and Personalities
SEVEN SCENES OF COMMUNICATION BREAKDOWNS
The Postmortem
Storytime
The Telephone Game
Into the Weeds
The Reality Check
The Takeover
The Blowhard
DATA PERSONALITIES
Data Enthusiasts
Data Cynics
Data Heads
CHAPTER SUMMARY
NOTES
CHAPTER 15 What's Next?
Index
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Big Data, Data Science, Machine Learning, Artificial Intelligence, Neural Networks, Deep Learning … It can be buzzword bingo, but make no mistake, everything is becoming “datafied” and an understanding of data problems and the data science toolset is becoming a requirement for every business person. Alex and Jordan have put together a must read whether you are just starting your journey or already in the thick of it. They made this complex space simple by breaking down the “data process” into understandable patterns and using everyday examples and events over our history to make the concepts relatable.
—Milen Mahadevan, President of 84.51°
.....
We discovered a middle ground between data workers and business professionals where honest discussions about data can take place without being too technical or too simplified. It involves both sides thinking more critically about data problems, large or small. That's what this book is about.
To become better at understanding and working with data you will need to be open to learning seemingly complicated data concepts. And, even if you already know these concepts, we'll teach you how to translate them to your audience of stakeholders.
.....