Applying Machine-Learning Methods Based on Causality Analysis to Determine Air Quality in China
Bocheng Wang 1  
More details
Hide details
Communication University of Zhejiang, Hangzhou, China
Bocheng Wang   

Communication University of Zhejiang, Hangzhou xueyuan street No.998,Zhejiang province, 310018 hangzhou, China
Submission date: 2018-08-29
Final revision date: 2018-10-27
Acceptance date: 2018-11-07
Online publication date: 2019-05-29
Publication date: 2019-07-08
Pol. J. Environ. Stud. 2019;28(5):3877–3885
A novel method was proposed for identifying air quality in China. Causality analysis-based significance tests combined with different machine-learning algorithms were carried out to achieve an automated and accurate classification. To this end, the most developed 100 cities in China were selected as study areas. We analyzed meteorological factors such as temperature, humidity, precipitation, wind speed, air pressure, sunshine duration, evaporation and grand surface temperature, and the individual industrial pollutants of NO2, SO2, CO and O3 by means of time series from a large amount of air monitoring data, and focused on the causality influence of the accumulative process of each pollution ingredient on PM2.5. In order to better clarify the formation of haze, joint regression models were established to quantify the influence degree of different factors on the cause of PM2.5. Different classification models, including KNN, SVM, ensemble and decision tree were trained and tested to predict air quality. An accuracy of 90.2% with the ensemble (boosted trees) classifier was obtained in this study. Results of feature selection and classification both indicated that NO2 took an important role in the contribution of PM2.5 concentrations during 2015-2017 in China.