Читать книгу Machine Learning Algorithms and Applications - Группа авторов - Страница 24

1.4 Results and Discussions

The open data is being provided by OpenAQ organization [13]. Their aim is to help people fight air pollution by providing open data and open-source tools. The data is obtained from government bodies as well as research groups and aggregated by OpenAQ. OpenAQ API was used to fetch the latest data in data frame and saved in .csv format for computations. Figure 1.3 shows the screenshot of data fetched on 6th June, 2020 for Visakhapatnam, India.

Table 1.1 Range of AQI categories.

AQI category (range)	PM10 (24hr)	PM2.5 (24hr)	NO2 (24hr)	O3 (8hr)	CO (8hr)	SO2 (24hr)	NH3 (24hr)	Pb (24hr)
Good (0–50)	0–50	0–30	0–40	0–50	0–1.0	0–40	0–200	0–0.5
Satisfactory (51–100)	51–100	31–60	41–80	51–100	1.1–2.0	41–80	201–400	0.5–1.0
Moderately polluted (101–200)	101–250	61–90	81–180	101–168	2.1–10	81–380	401–800	1.1–2.0
Poor (201–300)	251–350	91–120	181–280	169–208	10–17	381–800	801–1200	2.1–3.0
Very poor (301–400)	351–430	121–250	281–400	209–748	17–34	801–1,600	1,200–1,800	3.1–3.5
Severe (401–500)	430+	250+	400+	748+	34+	1600+	1800+	3.5+

1. K-Means Clustering Outcomes: As explained in the methodology section, we applied K-means clustering to determine the classes via clusters for our unsupervised data. In order to find out the optimal number of clusters required, Silhouette coefficient was calculated. The Silhouette coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample where b is the distance between a sample and the nearest cluster that the sample is not a part of. The value of Silhouette coefficient for a sample is (b – a)/max (a, b). For our experiments, we kept it equal to 7 using the Elbow method. After clustering, the clustered data were assigned labels for air quality using the AQI table. The required range for different air control parameters is shown in Table 1.1.

We worked on six parameters, namely, NO2, O3, PM10, PM2.5, SO2, and CO. To build the LSTM model, we trained our model for 14 different places in India, namely, Visakhapatnam (GVMC Ram Nagar), Ajmer (Civil Lines), Alwar, Vasundhara (Ghaziabad), Gurgaon (Vikas Sadan), Bandra (Maharashtra), Bhiwadi Industrial Area, Bengaluru (BWSSB Kadabesanaha), Amritsar (Golden Temple), Anand Vihar, R K Puram, Punjabi Bagh, NSIT (Dwarka), and Sector 62 Noida. First of all, K-means clustering was applied.

2. SVM outcomes: The data values (1,870) were divided into training and testing sets. We took 80% for the training set and 20% for the testing set. The clustered data was trained on SVM against air quality so that air quality could be determined based on the values of all parameters. Sklearn library was used for it [14]. SVM was cross-validated using GridSearchCV (k = 10) technique. Results on 374 test samples could be seen in Table 1.2. Best parameter set found was {c: 0.1, gamma: 0.001, kernel: linear}.

3. LSTM outcomes: To build the LSTM model, we trained our model for 14 different places in India, namely, Visakhapatnam (GVMC Ram Nagar), Ajmer (Civil Lines), Alwar, Vasundhara (Ghaziabad), Gurgaon (Vikas Sadan), Bandra (Maharashtra), Bhiwadi Industrial Area, Bengaluru (BWSSB Kadabesanaha), Amritsar (Golden Temple), Anand Vihar, R K Puram, Punjabi Bagh, NSIT (Dwarka), and Sector 62 Noida. Five thousand samples were used for training and 500 samples for testing of each model.

Each model had different values for different parameters like kernel initializer, batch size, and epochs during hyper parameter tuning. We used Keras library in Python [15]. The performance was evaluated with two metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Table 1.3 shows the MAE and RMSE values received. MAE is calculated by (∑|y − x|)/n, and RMSE is calculated by √(∑y − x)2/n where y is predicted value and x is actual value.

Figure 1.4 shows the prediction values for Bengaluru City at present hour as well as for 2 days 3 hours after 13th December, 2017. Figure 1.5 shows the prediction values for 2 days 3 hours after 6th June, 2020. We observed that on an average Bengaluru is a cleaner city as compared to other cities even during November and December. It was realized that it could have been due to rainy weather. Bengaluru gets rain almost every day and due to which the majority of air pollutants get washed down thus resulting into reduced air pollution.

Figure 1.6 shows the predicted values at present hour and for future one day 3 hours for Anand Vihar, New Delhi, after 13th December, 2017. New Delhi suffers from heavy pollution and therefore the quality of observed air was very poor. PM2.5 level remains high, making the air not only toxic but also prone to causing breathing problems. We have also generated advisory for the users of the app. Figure 1.7 shows the predicted values for 1 day and 3 hours for Anand Vihar, New Delhi, after 6th June, 2020. It could clearly be seen that pollution levels have drastically reduced and air quality has also become better due to imposed lockdown as there is less traffic and industrial waste emissions.

The experiments were performed for batch sizes of 10, 24, 15, 8, and 6 with epochs of 10 and 100. The MAE Scores for LSTM Hyper Parameters for NO2, O3, PM10, PM2.5, and SO2 are shown in (Table 1.4), and after careful analysis of the LSTM Hyper Parameter scores, we zeroed in on the batch size with minimum bias.

4. Data Visualization: One of the main objectives of the project was to provide better visualizations to the normal people who are not able to interpret the relations between different values of the air pollutants. We therefore generated the Heat Maps of different parameters. Individual Heat Maps for the parameters as well as combined Heat Maps for the parameters have been provided.

Figure 1.8 shows the Heat Map for Ozone gas O3 for 12th and 13th December, 2017. From the map, we could observe that O3 suffers maximum fluctuations between day and night intervals. O3 levels reduce at midnight and are very high on 13th December evening time. This could be due to heavy vehicular traffic during evening hours. Figure 1.9 shows Heat Map for O3 for 6th to 8th June, 2020 which clearly shows reduction in O3 levels during less vehicular traffic and reduced industrial emissions.

Figure 1.10 shows the Heat Map for all the parameters for the days 11th, 12th, and 13th December, 2017, at Sector 62, Noida. From the Heat Maps it could be observed that PM2.5 is the main pollution causing parameter in the Air. It could also be observed that it remains at dangerous levels on all days and during Days as well as Nights. Figure 1.11 shows the Heat Map for all the parameters for the days 6th, 7th and 8th June, 2020 at Sector 62, Noida. The reduced levels of all pollutants could clearly be seen from the Heat Map as a result of imposed lockdown. However, PM2.5 still remains the top contributing factor toward pollution in the area.

Figure 1.12 shows the predicted values of O3 for Anand Vihar, New Delhi in December, 2017, and decline in O3 levels can be observed. Figure 1.13 shows the predicted values of PM10 for Sector 62, Noida in June, 2020, and decline in levels could be observed.

The quality of air as shown in Figure 1.14 could also be observed/predicted for the major cities of India. This helped the user to study the quality of air throughout the country. The figure shows the quality of air as severe (magenta), very poor (yellow), poor (cyan), moderate (red), satisfactory (green), and good as blue dot on the map. It was realized that smaller cities, towns, and villages in India have good air quality. It is only the Metropolitan cities and the areas surrounding these cities that suffer from worst air quality.

From our project, we had some major findings. It was found that the values of different parameters of air depend on the latest past records (few days to a month) and not on many previous months. While retrieving real-time values through API for different parameters, sometimes, null or zero values occur. This might be due to malfunctioning of the sensors or inappropriate weather conditions. Zero or very less values might also occur at night because of the fact that certain parameters like O3 mix with other chemical compounds to form other compounds and consequently their value reduces. No2 and SO2 are also sometimes interacting and hence their abrupt values. The raw data is much easier to understand through visualizations for a common man. Also, lockdown is expected to be the effective alternative measure to be implemented for controlling air pollution.

Figure 1.3 Screenshot of fetched data.

Table 1.2 Precision, recall, and F1-score.

Classes	Precision	Recall	F1-Score
Moderate	1.0	0.99	0.99
Poor	1.0	0.95	0.97
Satisfactory	0.98	1.0	0.99
Severe	1.0	1.0	1.0
Very Poor	1.0	1.0	1.0
Avg/total	0.99	0.99	0.99
Final Accuracy: 0.9893

Table 1.3 MAE and RMSE scores for different epochs.

Test MAE for 1	8.864
Test RMSE for 1	12.122
Test MAE for 2	17.996
Test RMSE for 2	35.390
Test MAE for 3	23.820
Test RMSE for 3	35.938
Test MAE for 4	6.021
Test RMSE for 4	9.269

Figure 1.4 Predicted values in Bengaluru in December, 2017.

Figure 1.5 Predicted values in Bengaluru in June, 2020.

Figure 1.6 Predicted values in New Delhi in December, 2017.

Figure 1.7 Predicted values in New Delhi in June, 2020.

Table 1.4 MAE scores for LSTM hyper parameters.

Batch size	Epochs	NO2	O3	PM10	PM2.5	SO2
10	10	22	52	142	64	14
24	100	17	22	142	52	13
15	100	13	19	139	51	13
8	10	16.8	25.4	124	44.8	13
6	10	13	25	119.7	44	13

Figure 1.8 Heat map for ozone O₃ for day and night in December, 2017.

Figure 1.9 Heat map for ozone O₃ for day and night in June, 2020.

Figure 1.10 Heat map for all parameters for 3 days and nights in December, 2017.

Figure 1.11 Heat map for all parameters for 3 days and nights in June, 2020.

Figure 1.12 Predicted values for O₃ for Anand Vihar, New Delhi.

Figure 1.13 Predicted values for PM₁₀ for Sector 62, Noida.

Figure 1.14 Pollution levels in major Indian cities.

Machine Learning Algorithms and Applications

Подняться наверх