Investigating China’s Urban Air Quality Using Big Data, Information Theory, and Machine Learning
Sheng Chen1, 2, Guangyuan Kan1, 3, Jiren Li1, Ke Liang2, Yang Hong3, 4
More details
Hide details
1State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, Research Center on Flood
and Drought Disaster Reduction of the Ministry of Water Resources, China Institute of Water Resources
and Hydropower Research, Beijing 100038, P.R. China
2College of Hydrology and Water Resources, Hohai University, Nanjing 210098, P.R. China
3State Key Laboratory of Hydroscience and Engineering, Department of Hydraulic Engineering, Tsinghua University,
Beijing 100084, P.R. China
4Department of Civil Engineering and Environmental Science, University of Oklahoma, Norman, OK, USA
Online publish date: 2017-12-28
Publish date: 2018-01-26
Submission date: 2017-06-08
Final revision date: 2017-06-20
Acceptance date: 2017-06-20
Pol. J. Environ. Stud. 2018;27(2):565–578
With the development of the economy and industrial construction, air quality deteriorates dramatically in China and seriously threatens people’s health. To investigate which factors most affect air quality and provide a useful tool to assist the prediction and early warning of air pollution in urban areas, we applied a sensor that observed air quality big data, information theory-based predictor significance identification, and PEK-based machine learning to air quality index (AQI) analysis and prediction in this paper. We found that the stability of air quality has a high relationship with absolute air quality, and that improvement of air quality can also improve stability. Air quality in southern and western cities is better than that of northern and eastern cities. AQI time series of cities with closer geophysical locations have a closer relationship with others. PM2.5, PM10, and SO2 are the most important impact factors. The machine learning-based prediction is useful for AQI prediction and early warning. This tool could be applied to other city’s air quality monitoring and early warning to further verify its effectiveness and robustness. Finally, we suggested the use of a training data sample with better quality and representatives to further improve AQI prediction model performance in future research.