ORIGINAL RESEARCH
Reconstruction of Missing Values at PM2.5 Monitoring Sites Combining K-Shape Clustering and Conditional Score-Based Diffusion Models for Imputation
,
 
,
 
,
 
,
 
,
 
,
 
 
 
 
More details
Hide details
1
School of Geomatics, Anhui University of Science and Technology, Huainan 232001, China
 
 
Submission date: 2024-11-01
 
 
Final revision date: 2025-02-23
 
 
Acceptance date: 2025-03-17
 
 
Online publication date: 2025-05-09
 
 
Corresponding author
Zhen Zhang   

School of Geomatics, Anhui University of Science and Technology, Huainan 232001, China
 
 
 
KEYWORDS
TOPICS
ABSTRACT
PM2.5 is a significant contributor to air pollution, and complete air quality monitoring data is the key to effective prevention and control of PM2.5. However, there are many missing values in real-time monitoring data due to the instability of the monitoring system, machine failures, or human error. Taking the Yangtze River Delta (YRD) region as an example, this study compared the filling effect of various algorithms in the absence of PM2.5 concentration ground monitoring data, then selected the optimal algorithm and combined it with the K-Shape clustering partitioning results to fill the missing PM2.5 concentration data values. The results showed that the Conditional Score-based Diffusion Models for Imputation (CSDI) had better interpolation accuracy than Autoregressive Integrated Moving Average (ARIMA), K-Nearest Neighbors (KNN), and Multiple Imputation (MI) in the missing values imputation task. The historical PM2.5 data from the YRD, when analyzed using CSDI with K-Shape clustering, showed that Partition III had the highest accuracy and Partition II had the lowest. This variance was due to both the clustering accuracy and the inherent characteristics of each partition regarding PM2.5 fluctuations. Analyzing the daily variation characteristics of PM2.5 concentrations in different partitions revealed approximately 9 am, 3 pm, and 9 pm as the three main time nodes with large CSDI filling errors in the YRD region. These findings have significant implications for air quality monitoring and PM2.5 concentration prediction.
eISSN:2083-5906
ISSN:1230-1485
Journals System - logo
Scroll to top