ORIGINAL RESEARCH
Reconstruction of Missing Values at PM2.5
Monitoring Sites Combining K-Shape
Clustering and Conditional Score-Based
Diffusion Models for Imputation
More details
Hide details
1
School of Geomatics, Anhui University of Science and Technology, Huainan 232001, China
Submission date: 2024-11-01
Final revision date: 2025-02-23
Acceptance date: 2025-03-17
Online publication date: 2025-05-09
Corresponding author
Zhen Zhang
School of Geomatics, Anhui University of Science and Technology, Huainan 232001, China
KEYWORDS
TOPICS
ABSTRACT
PM2.5 is a significant contributor to air pollution, and complete air quality monitoring data is the
key to effective prevention and control of PM2.5. However, there are many missing values in real-time
monitoring data due to the instability of the monitoring system, machine failures, or human error.
Taking the Yangtze River Delta (YRD) region as an example, this study compared the filling effect
of various algorithms in the absence of PM2.5 concentration ground monitoring data, then selected the
optimal algorithm and combined it with the K-Shape clustering partitioning results to fill the missing
PM2.5 concentration data values. The results showed that the Conditional Score-based Diffusion Models
for Imputation (CSDI) had better interpolation accuracy than Autoregressive Integrated Moving Average
(ARIMA), K-Nearest Neighbors (KNN), and Multiple Imputation (MI) in the missing values imputation
task. The historical PM2.5 data from the YRD, when analyzed using CSDI with K-Shape clustering,
showed that Partition III had the highest accuracy and Partition II had the lowest. This variance was
due to both the clustering accuracy and the inherent characteristics of each partition regarding PM2.5
fluctuations. Analyzing the daily variation characteristics of PM2.5 concentrations in different partitions
revealed approximately 9 am, 3 pm, and 9 pm as the three main time nodes with large CSDI filling
errors in the YRD region. These findings have significant implications for air quality monitoring and
PM2.5 concentration prediction.