Читать книгу Advanced Analytics and Deep Learning Models - Группа авторов - Страница 39

2.3.4 Data Handling 2.3.4.1 Missing Values and Data Cleaning

First step is cleaning the data, for which we need to find the null values in the dataset. Figures 2.2 and 2.3 show the number of null values in every column. There are two methods of handling null value: first is that we can drop all the rows with null values, which will result in data loss; the other is that we could calculate the mean of all the values and replace all null values with the mean. Therefore, before cleaning the null value, we drop columns like society and balcony with multiple null values. Along with it, we also drop the columns like area type and availability, as our main goal is to predict the price.

In the size column, there are values with different attributes like 3 BHK and 3 BK, which means different; hence, to generalize, we will create a new column BHK. In this column, we would apply a function where we would tokenize each word; here, we keep the numbers and get rid of the other words. Therefore, we get a column BHK. In the total square feet column, there are entries where range is mention and not exact number; in this case, we replace it with the average of both the number.

Figure 2.2 Missing values.

Figure 2.3 Visualizing missing values using heatmap.

Advanced Analytics and Deep Learning Models

Подняться наверх