Читать книгу Data Science - Field Cady - Страница 2
ОглавлениеTable of Contents
1 Cover
2 Data Science: The Executive Summary
5 1 Introduction 1.1 Why Managers Need to Know About Data Science 1.2 The New Age of Data Literacy 1.3 Data‐Driven Development 1.4 How to Use this Book
6 2 The Business Side of Data Science 2.1 What Is Data Science? 2.2 Data Science in an Organization 2.3 Hiring Data Scientists 2.4 Management Failure Cases
7 3 Working with Modern Data 3.1 Unstructured Data and Passive Collection 3.2 Data Types and Sources 3.3 Data Formats 3.4 Databases 3.5 Data Analytics Software Architectures Notes
8 4 Telling the Story, Summarizing Data 4.1 Choosing What to Measure 4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture Is Worth a Thousand Numbers 4.3 Experiments, Correlation, and Causality 4.4 Summarizing One Number 4.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 4.6 Summarizing Two Numbers: Correlations and Scatterplots 4.7 Advanced Material: Fitting a Line or Curve 4.8 Statistics: How to Not Fool Yourself 4.9 Advanced Material: Probability Distributions Worth Knowing
9 5 Machine Learning 5.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers 5.2 Measuring Performance 5.3 Advanced Material: Important Classifiers 5.4 Structure of the Data: Unsupervised Learning 5.5 Learning as You Go: Reinforcement Learning
10 6 Knowing the Tools 6.1 A Note on Learning to Code 6.2 Cheat Sheet 6.3 Parts of the Data Science Ecosystem 6.4 Advanced Material: Database Query Crash Course
11 7 Deep Learning and Artificial Intelligence 7.1 Overview of AI 7.2 Neural Networks 7.3 Natural Language Processing 7.4 Knowledge Bases and Graphs
12 Postscript
13 Index
List of Tables
1 Chapter 2Table 2.1 Data science work can largely be divided into producing human‐under...Table 2.2 Data engineers specialize in creating software systems to store and...Table 2.3 BI analysts generally lack the ability to create mathematically com...Table 2.4 Software engineers create products of a scale and complexity far gr...
2 Chapter 6Table 6.1 These functions – which are present in most SQL‐like languages – ta...Table 6.2 Common SQL aggregation functions.
3 Chapter 7Table 7.1 Feature of regular expressions.
List of Illustrations
1 Chapter 2Figure 2.1 The process of data science is deeply iterative, with the questio...
2 Chapter 4Figure 4.1 Anscombe's quartet is a famous demonstration of the limitations o...Figure 4.2 Mean, median, and mode are the most common measures of central te...Figure 4.3 Box‐and‐whisker plots capture the median, the 25% and 75% percent...Figure 4.4 Box‐and‐whisker plots allow you to visually compare several data ...Figure 4.5 The histograms of two datasets, plotted for comparison on (a) a n...Figure 4.6 In both of these plots the correlation between x and y will be cl...Figure 4.7 This dataset will have ordinal correlation of 1, since y consiste...Figure 4.8 Residuals measure the accuracy of a model. Here the gray points a...Figure 4.9 A degenerative form of “curve fitting” is used as a base of compa...Figure 4.10 Large residuals can come from two sources: either that data we a...Figure 4.11 The most intuitive way to think of a probability distribution is...Figure 4.12 The area under the curve of a continuous probability is distribu...Figure 4.13 The Bernoulli distribution is just the flipping of a biased coin...Figure 4.14 The uniform distribution gives constantly probability density ov...Figure 4.15 The normal distribution, aka Gaussian, is the prototypical “bell...Figure 4.16 The exponential distribution is often used to estimate the lengt...Figure 4.17 Say there are many independent events that could happen (there a...
3 Chapter 5Figure 5.1 K‐fold cross‐validation breaks the dataset into k partitions. Eac...Figure 5.2 The performance of a classifier can't really be boiled down to a ...Figure 5.3 The ROC curve plots the true/false positive rate for a classifier...Figure 5.4 For this cutoff the fraction of all 0s that get incorrectly flagg...Figure 5.5 For this cutoff a small change in your classification threshold w...Figure 5.6 In a lift curve the x‐axis (the “reach”) is the fraction of all d...Figure 5.7 A decision tree classifier is somewhat like a flow chart. Every n...Figure 5.8 Support vector machines look for a hyperplane that divides your t...Figure 5.9 The key weakness of support vector machines is that often there i...Figure 5.10 Sometimes you can fix the linear separability problem by mapping...Figure 5.11 The Sigmoid function shows up many places in machine learning. E...Figure 5.12 A perceptron is a neural network with a single hidden layer.Figure 5.13 The “curse of dimensionality” describes how high‐dimensional spa...Figure 5.14 If many fields in your data move in lock‐step then in a sense th...Figure 5.15 A Scree plot shows how much of a dataset's variability is accoun...Figure 5.16 The “clusters” identified by k‐means clustering are really just ...Figure 5.17 The indicated point is closer to the middle of the other cluster...
4 Chapter 6Figure 6.1 The map‐reduce paradigm is one of the building blocks of the Big ...
5 Chapter 7Figure 7.1 A neural network consists of “nodes” arranged into “layers.” Each...Figure 7.2 Convolutional neural networks are stars in image processing. The ...
Pages
1 iv
2 v
3 1
4 2
5 3
6 4
7 5
8 7
9 8
10 9
11 10
12 11
13 12
14 13
15 14
16 15
17 16
18 17
19 18
20 19
21 20
22 21
23 22
24 23
25 24
26 25
27 26
28 27
29 28
30 29
31 30
32 31
33 32
34 33
35 34
36 35
37 36
38 37
39 38
40 39
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
52 52
53 53
54 54
55 55
56 56
57 57
58 58
59 59
60 60
61 61
62 62
63 63
64 64
65 65
66 66
67 67
68 68
69 69
70 70
71 71
72 72
73 73
74 74
75 75
76 76
77 77
78 78
79 79
80 80
81 81
82 82
83 83
84 84
85 85
86 86
87 87
88 88
89 89
90 90
91 91
92 92
93 93
94 94
95 95
96 96
97 97
98 98
99 99
100 101
101 102
102 103
103 104
104 105
105 106
106 107
107 108
108 109
109 110
110 111
111 112
112 113
113 114
114 115
115 116
116 117
117 118
118 119
119 120
120 121
121 122
122 123
123 124
124 125
125 126
126 127
127 128
128 129
129 130
130 131
131 132
132 133
133 134
134 135
135 136
136 137
137 138
138 139
139 141
140 142
141 143
142 144
143 145
144 146
145 147
146 148
147 149
148 150
149 151
150 152
151 153
152 154
153 155
154 156
155 157
156 158
157 159
158 161
159 162
160 163
161 164
162 165
163 166
164 167
165 168
166 169
167 170
168 171
169 172
170 173
171 174
172 175
173 176
174 177
175 178
176 179
177 181
178 182
179 183
180 184
181 185
182 186
183 187