ORIGINAL RESEARCH
Multivariate Analysis for Characterization of Air Pollution Sources: Part 1 Prior Data Screening and Underlying Assumptions
 
More details
Hide details
1
Faculty of Public and Environmental Health, Department of Environmental Health & Environmental Studies, University of Khartoum, Khartoum, 205, Sudan
 
2
College of Health Sciences, Department of Public Health, Saudi Electronic University, Riyadh, 11673, Kingdom of Saudi Arabia
 
3
International Joint Research Center for Persistent Toxic Substances (IJRC-PTS), State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
 
 
Submission date: 2023-10-02
 
 
Final revision date: 2023-12-05
 
 
Acceptance date: 2024-01-11
 
 
Online publication date: 2024-04-18
 
 
Corresponding author
Mohammed O.A. Mohammed   

Faculty of Public and Environmental Health, Department of Environmental Health & Environmental Studies, University of Khartoum, Khartoum, 205, Sudan
 
 
 
KEYWORDS
TOPICS
ABSTRACT
There is a real need for comparability and consistency of findings obtained from different multivariate methods, based on different assumptions and sensitivity to data errors. This study aims to investigate essential aspects of data screening prior to analysis, particularly the detection of outliers, communalities, multicollinearity, and Kaiser-Meyer-Olkin (KMO) and Bartlett’s tests, and to examine the influence of changing test parameters such as the number of convergence, number of bootstrap runs, FPEAK value, and minimum value of coefficient of determination (R2) on model results. Positive matrix factorization (PMF) and Unmix were applied to monitoring data collected from a receptor site. Findings of communalities estimate and multicollinearity indicated possible data errors in Ca, Cu, Na, and Mn, which affected the stability of source profiles. PMF detected biomass burning, coal combustion, traffic, industrial emissions, Mn-enriched sources, and secondary aerosols, while the Unmix model identified similar sources with comparable profiles, apart from profiles of vehicle exhaust and industrial emissions showing slight differences. Unmix was highly influenced by outliers, multicollinearity, and, to a lesser extent, change in sample size compared to PMF. We recommend interpreting the results of Bootstrapping, rather than basic runs for both PMF and Unmix. We also recommend data screening prior to further modeling. We suggest checking multicollinearity using more than one statistical measure, particularly VIF (Variance Inflation Factor) values together with tolerance values.
eISSN:2083-5906
ISSN:1230-1485
Journals System - logo
Scroll to top