Читать книгу Data Science - Field Cady - Страница 2

Оглавление

Table of Contents

Cover

Data Science: The Executive Summary

Copyright

Dedication

1 Introduction 1.1 Why Managers Need to Know About Data Science 1.2 The New Age of Data Literacy 1.3 Data‐Driven Development 1.4 How to Use this Book

2 The Business Side of Data Science 2.1 What Is Data Science? 2.2 Data Science in an Organization 2.3 Hiring Data Scientists 2.4 Management Failure Cases

3 Working with Modern Data 3.1 Unstructured Data and Passive Collection 3.2 Data Types and Sources 3.3 Data Formats 3.4 Databases 3.5 Data Analytics Software Architectures Notes

4 Telling the Story, Summarizing Data 4.1 Choosing What to Measure 4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture Is Worth a Thousand Numbers 4.3 Experiments, Correlation, and Causality 4.4 Summarizing One Number 4.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 4.6 Summarizing Two Numbers: Correlations and Scatterplots 4.7 Advanced Material: Fitting a Line or Curve 4.8 Statistics: How to Not Fool Yourself 4.9 Advanced Material: Probability Distributions Worth Knowing

5 Machine Learning 5.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers 5.2 Measuring Performance 5.3 Advanced Material: Important Classifiers 5.4 Structure of the Data: Unsupervised Learning 5.5 Learning as You Go: Reinforcement Learning

10  6 Knowing the Tools 6.1 A Note on Learning to Code 6.2 Cheat Sheet 6.3 Parts of the Data Science Ecosystem 6.4 Advanced Material: Database Query Crash Course

11  7 Deep Learning and Artificial Intelligence 7.1 Overview of AI 7.2 Neural Networks 7.3 Natural Language Processing 7.4 Knowledge Bases and Graphs

12  Postscript

13  Index

14  End User License Agreement

List of Tables

1 Chapter 2Table 2.1 Data science work can largely be divided into producing human‐under...Table 2.2 Data engineers specialize in creating software systems to store and...Table 2.3 BI analysts generally lack the ability to create mathematically com...Table 2.4 Software engineers create products of a scale and complexity far gr...

2 Chapter 6Table 6.1 These functions – which are present in most SQL‐like languages – ta...Table 6.2 Common SQL aggregation functions.

3 Chapter 7Table 7.1 Feature of regular expressions.

List of Illustrations

1 Chapter 2Figure 2.1 The process of data science is deeply iterative, with the questio...

2 Chapter 4Figure 4.1 Anscombe's quartet is a famous demonstration of the limitations o...Figure 4.2 Mean, median, and mode are the most common measures of central te...Figure 4.3 Box‐and‐whisker plots capture the median, the 25% and 75% percent...Figure 4.4 Box‐and‐whisker plots allow you to visually compare several data ...Figure 4.5 The histograms of two datasets, plotted for comparison on (a) a n...Figure 4.6 In both of these plots the correlation between x and y will be cl...Figure 4.7 This dataset will have ordinal correlation of 1, since y consiste...Figure 4.8 Residuals measure the accuracy of a model. Here the gray points a...Figure 4.9 A degenerative form of “curve fitting” is used as a base of compa...Figure 4.10 Large residuals can come from two sources: either that data we a...Figure 4.11 The most intuitive way to think of a probability distribution is...Figure 4.12 The area under the curve of a continuous probability is distribu...Figure 4.13 The Bernoulli distribution is just the flipping of a biased coin...Figure 4.14 The uniform distribution gives constantly probability density ov...Figure 4.15 The normal distribution, aka Gaussian, is the prototypical “bell...Figure 4.16 The exponential distribution is often used to estimate the lengt...Figure 4.17 Say there are many independent events that could happen (there a...

3 Chapter 5Figure 5.1 K‐fold cross‐validation breaks the dataset into k partitions. Eac...Figure 5.2 The performance of a classifier can't really be boiled down to a ...Figure 5.3 The ROC curve plots the true/false positive rate for a classifier...Figure 5.4 For this cutoff the fraction of all 0s that get incorrectly flagg...Figure 5.5 For this cutoff a small change in your classification threshold w...Figure 5.6 In a lift curve the x‐axis (the “reach”) is the fraction of all d...Figure 5.7 A decision tree classifier is somewhat like a flow chart. Every n...Figure 5.8 Support vector machines look for a hyperplane that divides your t...Figure 5.9 The key weakness of support vector machines is that often there i...Figure 5.10 Sometimes you can fix the linear separability problem by mapping...Figure 5.11 The Sigmoid function shows up many places in machine learning. E...Figure 5.12 A perceptron is a neural network with a single hidden layer.Figure 5.13 The “curse of dimensionality” describes how high‐dimensional spa...Figure 5.14 If many fields in your data move in lock‐step then in a sense th...Figure 5.15 A Scree plot shows how much of a dataset's variability is accoun...Figure 5.16 The “clusters” identified by k‐means clustering are really just ...Figure 5.17 The indicated point is closer to the middle of the other cluster...

4 Chapter 6Figure 6.1 The map‐reduce paradigm is one of the building blocks of the Big ...

5 Chapter 7Figure 7.1 A neural network consists of “nodes” arranged into “layers.” Each...Figure 7.2 Convolutional neural networks are stars in image processing. The ...

Guide

Cover Page

Title Page

Copyright

Table of Contents

Begin Reading

Postscript

Index

WILEY END USER LICENSE AGREEMENT

Pages

iv

2 v

1

2

3

6 4

5

7

8

10  9

11  10

12  11

13  12

14  13

15  14

16  15

17  16

18  17

19  18

20  19

21  20

22  21

23  22

24  23

25 24

26  25

27  26

28  27

29  28

30  29

31  30

32  31

33  32

34 33

35  34

36 35

37  36

38  37

39  38

40  39

41  41

42  42

43 43

44  44

45  45

46  46

47  47

48 48

49  49

50 50

51  51

52  52

53  53

54  54

55  55

56  56

57 57

58  58

59 59

60  60

61  61

62 62

63  63

64  64

65  65

66  66

67  67

68  68

69  69

70  70

71  71

72  72

73  73

74  74

75  75

76  76

77  77

78  78

79  79

80  80

81  81

82 82

83  83

84  84

85  85

86  86

87  87

88  88

89  89

90  90

91  91

92  92

93  93

94  94

95 95

96  96

97  97

98  98

99  99

100  101

101  102

102 103

103  104

104  105

105  106

106  107

107 108

108  109

109  110

110  111

111  112

112 113

113  114

114  115

115  116

116  117

117  118

118  119

119  120

120  121

121  122

122  123

123  124

124  125

125  126

126  127

127  128

128  129

129  130

130  131

131  132

132  133

133  134

134  135

135  136

136  137

137  138

138  139

139  141

140  142

141  143

142  144

143  145

144  146

145  147

146 148

147  149

148 150

149  151

150 152

151 153

152  154

153  155

154  156

155  157

156  158

157  159

158  161

159  162

160 163

161  164

162 165

163  166

164  167

165  168

166  169

167  170

168  171

169  172

170  173

171  174

172  175

173  176

174  177

175 178

176  179

177  181

178  182

179  183

180 184

181 185

182 186

183  187

Data Science

Подняться наверх