Investigating China’s Urban Air Quality Using
Big Data, Information Theory,
and Machine Learning

Sheng Chen; Guangyuan Kan; Jiren Li; Ke Liang; Yang Hong

doi:10.15244/pjoes/75159

2/2018 vol. 27

CC BY-NC 4.0

Get citation

ORIGINAL RESEARCH

Investigating China’s Urban Air Quality Using Big Data, Information Theory, and Machine Learning

Sheng Chen^{1, 2}, Guangyuan Kan^{1, 3}, Jiren Li¹, Ke Liang², Yang Hong^{3, 4}

More details

Hide details

¹State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, Research Center on Flood
and Drought Disaster Reduction of the Ministry of Water Resources, China Institute of Water Resources
and Hydropower Research, Beijing 100038, P.R. China
²College of Hydrology and Water Resources, Hohai University, Nanjing 210098, P.R. China
³State Key Laboratory of Hydroscience and Engineering, Department of Hydraulic Engineering, Tsinghua University,
Beijing 100084, P.R. China
⁴Department of Civil Engineering and Environmental Science, University of Oklahoma, Norman, OK, USA

Submission date: 2017-06-08

Final revision date: 2017-06-20

Acceptance date: 2017-06-20

Online publication date: 2017-12-28

Publication date: 2018-01-26

Pol. J. Environ. Stud. 2018;27(2):565-578

DOI: https://doi.org/10.15244/pjoes/75159

KEYWORDS

TOPICS

Atmospherical pollution control

Pollution prevention

ABSTRACT

With the development of the economy and industrial construction, air quality deteriorates dramatically in China and seriously threatens people’s health. To investigate which factors most affect air quality and provide a useful tool to assist the prediction and early warning of air pollution in urban areas, we applied a sensor that observed air quality big data, information theory-based predictor significance identification, and PEK-based machine learning to air quality index (AQI) analysis and prediction in this paper. We found that the stability of air quality has a high relationship with absolute air quality, and that improvement of air quality can also improve stability. Air quality in southern and western cities is better than that of northern and eastern cities. AQI time series of cities with closer geophysical locations have a closer relationship with others. PM2.5, PM10, and SO₂ are the most important impact factors. The machine learning-based prediction is useful for AQI prediction and early warning. This tool could be applied to other city’s air quality monitoring and early warning to further verify its effectiveness and robustness. Finally, we suggested the use of a training data sample with better quality and representatives to further improve AQI prediction model performance in future research.

Submit your paper

Notes to Authors

eISSN:	2083-5906
ISSN:	1230-1485