Data Science For Dummies
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Lillian Pierson. Data Science For Dummies
Data Science For Dummies® To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Data Science For Dummies Cheat Sheet” in the Search box. Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Getting Started with Data Science
Wrapping Your Head Around Data Science
Seeing Who Can Make Use of Data Science
Inspecting the Pieces of the Data Science Puzzle
Collecting, querying, and consuming data
Applying mathematical modeling to data science tasks
Deriving insights from statistical methods
Coding, coding, coding — it’s just part of the game
Applying data science to a subject area
Communicating data insights
Exploring Career Alternatives That Involve Data Science
The data implementer
The data leader
The data entrepreneur
Tapping into Critical Aspects of Data Engineering
Defining Big Data and the Three Vs
Grappling with data volume
Handling data velocity
Dealing with data variety
Identifying Important Data Sources
Grasping the Differences among Data Approaches
Defining data science
Defining machine learning engineering
Defining data engineering
Comparing machine learning engineers, data scientists, and data engineers
Storing and Processing Data for Data Science
Storing data and doing data science directly in the cloud
Using serverless computing to execute data science
Containerizing predictive applications within Kubernetes
Sizing up popular cloud-warehouse solutions
Introducing NoSQL databases
Storing big data on-premise
Reminiscing about Hadoop
Incorporating MapReduce, the HDFS, and YARN
Storing data on the Hadoop distributed file system (HDFS)
Putting it all together on the Hadoop platform
Introducing massively parallel processing (MPP) platforms
Processing big data in real-time
Using Data Science to Extract Meaning from Your Data
Machine Learning Means … Using a Machine to Learn from Data
Defining Machine Learning and Its Processes
Walking through the steps of the machine learning process
Becoming familiar with machine learning terms
Considering Learning Styles
Learning with supervised algorithms
Learning with unsupervised algorithms
Learning with reinforcement
Seeing What You Can Do
Selecting algorithms based on function
Using Spark to generate real-time big data analytics
Math, Probability, and Statistical Modeling
Exploring Probability and Inferential Statistics
Probability distributions
Conditional probability with Naïve Bayes
Quantifying Correlation
Calculating correlation with Pearson’s r
Ranking variable-pairs using Spearman’s rank correlation
Reducing Data Dimensionality with Linear Algebra
Decomposing data to reduce dimensionality
Reducing dimensionality with factor analysis
Decreasing dimensionality and removing outliers with PCA
Modeling Decisions with Multiple Criteria Decision-Making
Turning to traditional MCDM
Focusing on fuzzy MCDM
Introducing Regression Methods
Linear regression
Logistic regression
Ordinary least squares (OLS) regression methods
Detecting Outliers
Analyzing extreme values
Detecting outliers with univariate analysis
Detecting outliers with multivariate analysis
Introducing Time Series Analysis
Identifying patterns in time series
Modeling univariate time series data
Grouping Your Way into Accurate Predictions
Starting with Clustering Basics
Getting to know clustering algorithms
Examining clustering similarity metrics
Identifying Clusters in Your Data
Clustering with the k-means algorithm
Estimating clusters with kernel density estimation (KDE)
Clustering with hierarchical algorithms
Dabbling in the DBScan neighborhood
Categorizing Data with Decision Tree and Random Forest Algorithms
Drawing a Line between Clustering and Classification
Introducing instance-based learning classifiers
Getting to know classification algorithms
Making Sense of Data with Nearest Neighbor Analysis
Classifying Data with Average Nearest Neighbor Algorithms
Classifying with K-Nearest Neighbor Algorithms
Understanding how the k-nearest neighbor algorithm works
Knowing when to use the k-nearest neighbor algorithm
Exploring common applications of k-nearest neighbor algorithms
Solving Real-World Problems with Nearest Neighbor Algorithms
Seeing k-nearest neighbor algorithms in action
Seeing average nearest neighbor algorithms in action
Coding Up Data Insights and Decision Engines
Seeing Where Python and R Fit into Your Data Science Strategy
Using Python for Data Science
Sorting out the various Python data types
Numbers in Python
Strings in Python
Lists in Python
Tuples in Python
Sets in Python
Dictionaries in Python
Putting loops to good use in Python
Having fun with functions
Keeping cool with classes
Checking out some useful Python libraries
Saying hello to the NumPy library
Getting up close and personal with the SciPy library
Peeking into the Pandas offering
Bonding with MatPlotLib for data visualization
Learning from data with Scikit-learn
Using Open Source R for Data Science
Comprehending R’s basic vocabulary
Delving into functions and operators
Iterating in R
Observing how objects work
Sorting out R's popular statistical analysis packages
Examining packages for visualizing, mapping, and graphing in R
Visualizing R statistics with ggplot2
Analyzing networks with statnet and igraph
Mapping and analyzing spatial point patterns with spatstat
Generating Insights with Software Applications
Choosing the Best Tools for Your Data Science Strategy
Getting a Handle on SQL and Relational Databases
Investing Some Effort into Database Design
Defining data types
Designing constraints properly
Normalizing your database
Narrowing the Focus with SQL Functions
MINING TEXT WITH SQL
Making Life Easier with Excel
Using Excel to quickly get to know your data
Filtering in Excel
Using conditional formatting
Excel charting to visually identify outliers and trends
Reformatting and summarizing with PivotTables
Automating Excel tasks with macros
Telling Powerful Stories with Data
Data Visualizations: The Big Three
Data storytelling for decision makers
Data showcasing for analysts
Designing data art for activists
Designing to Meet the Needs of Your Target Audience
Step 1: Brainstorm (All about Eve)
Step 2: Define the purpose
Step 3: Choose the most functional visualization type for your purpose
Picking the Most Appropriate Design Style
Inducing a calculating, exacting response
Eliciting a strong emotional response
Selecting the Appropriate Data Graphic Type
Standard chart graphics
Comparative graphics
Statistical plots
Topology structures
Spatial plots and maps
Testing Data Graphics
Adding Context
Creating context with data
Creating context with annotations
Creating context with graphical elements
KNOWING WHEN TO GET PERSUASIVE
Taking Stock of Your Data Science Capabilities
Developing Your Business Acumen
Bridging the Business Gap
Contrasting business acumen with subject matter expertise
Defining business acumen
Traversing the Business Landscape
Seeing how data roles support the business in making money
Leveling up your business acumen
Fortifying your leadership skills
Surveying Use Cases and Case Studies
Documentation for data leaders
Documentation for data implementers
Improving Operations
Establishing Essential Context for Operational Improvements Use Cases
Exploring Ways That Data Science Is Used to Improve Operations
Making major improvements to traditional manufacturing operations
Optimizing business operations with data science
An AI case study: Automated, personalized, and effective debt collection processes
The solution
The result
Gaining logistical efficiencies with better use of real-time data
Another AI case study: Real-time optimized logistics routing
The solution
The result
Modernizing media and the press with data science and AI
Generating content with the click of a button
A SAMPLE OF GPT-3 GENERATED CONTENT
Yet another case study: Increasing content generation rates
The problem
The solution
The result
Making Marketing Improvements
Exploring Popular Use Cases for Data Science in Marketing
Turning Web Analytics into Dollars and Sense
Getting acquainted with omnichannel analytics
Mapping your channels
Building analytics around channel performance
Scoring your company’s channels
HEEDING THE DEMAND FOR DATA PRIVACY
Building Data Products That Increase Sales-and-Marketing ROI
Increasing Profit Margins with Marketing Mix Modeling
Collecting data on the four Ps
Inspecting important product features
Playing with the price aspect
Placing your product
Promoting your offer
Implementing marketing mix modeling
Increasing profitability with MMM
Enabling Improved Decision-Making
Improving Decision-Making
Barking Up the Business Intelligence Tree
Using Data Analytics to Support Decision-Making
Types of analytics
Common challenges in analytics
Data wrangling
Increasing Profit Margins with Data Science
Seeing which kinds of data are useful when using data science for decision support
Directing improved decision-making for call center agents
Case study: Improving call center operations
THE NEED
THE ACTION
THE OUTCOME
Discovering the tipping point where the old way stops working
Decreasing Lending Risk and Fighting Financial Crimes
Decreasing Lending Risk with Clustering and Classification
Preventing Fraud Via Natural Language Processing (NLP)
Monetizing Data and Data Science Expertise
Setting the Tone for Data Monetization
Monetizing Data Science Skills as a Service
Data preparation services
Model building services
Selling Data Products
Direct Monetization of Data Resources
Coupling data resources with a service and selling it
Making money with data partnerships
MONETIZING A PRODUCT THAT’S BUILT SOLELY FROM PARTNERS’ DATA RESOURCES
Pricing Out Data Privacy
Assessing Your Data Science Options
Gathering Important Information about Your Company
Unifying Your Data Science Team Under a Single Business Vision
Framing Data Science around the Company’s Vision, Mission, and Values
Taking Stock of Data Technologies
Inventorying Your Company’s Data Resources
Requesting your data dictionary and inventory
Confirming what’s officially on file
Unearthing data silos and data quality issues
People-Mapping
Requesting organizational charts
Surveying the skillsets of relevant personnel
Avoiding Classic Data Science Project Pitfalls
Staying focused on the business, not on the tech
Drafting best practices to protect your data science project
Tuning In to Your Company’s Data Ethos
Collecting the official data privacy policy
Taking AI ethics into account
Making Information-Gathering Efficient
Narrowing In on the Optimal Data Science Use Case
Reviewing the Documentation
Selecting Your Quick-Win Data Science Use Cases
Zeroing in on the quick win
Producing a POTI model
Picking between Plug-and-Play Assessments
Carrying out a data skill gap analysis for your company
Assessing the ethics of your company’s AI projects and products
Illustrating the need for ethical AI
Proving accountability for AI solutions
Vouching for your company’s AI
Unbiasing AI
Assessing data governance and data privacy policies
Planning for Future Data Science Project Success
Preparing an Implementation Plan
Supporting Your Data Science Project Plan
Analyzing your alternatives
Interviewing intended users and designing accordingly
POTI modeling the future state
Executing On Your Data Science Project Plan
Blazing a Path to Data Science Career Success
Navigating the Data Science Career Matrix
Landing Your Data Scientist Dream Job
Leaning into data science implementation
Acing your accreditations
Making the grade with coding bootcamps and data science career accelerators
Networking and building authentic relationships
Developing your own thought leadership in data science
Building a public data science project portfolio
Showcasing your data science skills
Deciding which data science activities to publish
Taking inspiration from the data science greats
Leading with Data Science
BECOMING YOUR COMPANY’S DATA SCIENCE LEADER: A TRUE STORY
Starting Up in Data Science
Choosing a business model for your data science business
Selecting a data science start-up revenue model
Taking inspiration from Kam Lee’s success story
Following in the footsteps of the data science entrepreneurs
The Part of Tens
Ten Phenomenal Resources for Open Data
Digging Through data.gov
Checking Out Canada Open Data
Diving into data.gov.uk
Checking Out US Census Bureau Data
Accessing NASA Data
Wrangling World Bank Data
Getting to Know Knoema Data
Queuing Up with Quandl Data
Exploring Exversion Data
Mapping OpenStreetMap Spatial Data
Ten Free or Low-Cost Data Science Tools and Applications
Scraping, Collecting, and Handling Data Tools
Sourcing and aggregating image data with ImageQuilts
Wrangling data with DataWrangler
Data-Exploration Tools
Getting up to speed in Gephi
Machine learning with the WEKA suite
Designing Data Visualizations
Getting Shiny by RStudio
Mapmaking and spatial data analytics with CARTO
Talking about Tableau Public
Using RAWGraphs for web-based data visualization
Communicating with Infographics
Making cool infographics with Infogram
Making cool infographics with Piktochart
Index. Symbols and Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
Z
About the Author
Dedication
Author’s Acknowledgments
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
This book was written as much for expert data scientists as it was for aspiring ones. Its content represents a new approach to doing data science — one that puts business vision and profitably at the heart of our work as data scientists.
Data science and artificial intelligence (AI, for short) have disrupted the business world so radically that it's nearly unrecognizable compared to what things were like just 10 or 15 years ago. The good news is that most of these changes have made everyone’s lives and businesses more efficient, more fun, and dramatically more interesting. The bad news is that if you don’t yet have at least a modicum of data science competence, your business and employment prospects are growing dimmer by the moment.
.....
You have a number of products to choose from when it comes to cloud-warehouse solutions. The following list looks at the most popular options:
A traditional RDBMS isn’t equipped to handle big data demands. That’s because it’s designed to handle only relational datasets constructed of data that’s stored in clean rows and columns and thus is capable of being queried via SQL. RDBMSs are incapable of handling unstructured and semistructured data. Moreover, RDBMSs simply lack the processing and handling capabilities that are needed for meeting big data volume-and-velocity requirements.
.....