Читать книгу Data Science For Dummies - Lillian Pierson - Страница 84

Detecting outliers with multivariate analysis

Оглавление

Sometimes outliers show up only within combinations of data points from disparate variables. These outliers wreak havoc on machine learning algorithms, so it’s important to detect and remove them. You can use multivariate analysis of outliers to do this. A multivariate approach to outlier detection involves considering two or more variables at a time and inspecting them together for outliers. You can use one of several methods, including:

 A scatter-plot matrix

 Boxplotting

 Density-based spatial clustering of applications with noise (DBScan) — as discussed in Chapter 5

 Principal component analysis (PCA, as shown in Figure 4-8)


Credit: Python for Data Science Essential Training Part 2, LinkedIn.com

FIGURE 4-8: Using PCA to spot outliers.

Data Science For Dummies

Подняться наверх