Home Journals IJEI Applying Hybrid Feature Selection Methods for Statistical Modelling of Roadside Particle Concentrations (PM2.5 and PNC)

JOURNAL METRICS

CiteScore 2024: 1.6 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2024: 0.233 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2024: 0.546 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

Applying Hybrid Feature Selection Methods for Statistical Modelling of Roadside Particle Concentrations (PM2.5 and PNC)

Applying Hybrid Feature Selection Methods for Statistical Modelling of Roadside Particle Concentrations (PM_2.5 and PNC)

A. Suleiman | M.R. Tight | A.D. Quinn

Department of Civil Engineering, Faculty of Engineering, Bayero University Kano, Nigeria

Department of Civil Engineering, School of Engineering, University of Birmingham, UK

Received:

N/A

|

Revised:

N/A

|

Accepted:

N/A

|

Available online:

N/A

| Citation

ei030201f.pdf

OPEN ACCESS

https://www.witpress.com/elibrary/ei-volumes/3/2/2661

Abstract:

The task of selecting a predictor variable to include in statistical models is enormous. A model built with fewer predictor variables can be more interpretable and less expensive than the one built with many input variables. In this study, the effects of hybrid feature selection methods (genetic algorithms [GA] and simulated annealing (SA) each combined with random forests [RF]) in improving the efficiency of five variants of multiple linear regression models in the prediction of roadside PM_2.5 and particle number count (PNC) concentrations are investigated. The GA-RF and SA-RF selected 9 and 16 variables, respectively, of the 27 predictor variables in the PM_2.5 training data. Thirteen variables were selected by the GA-RF of the 25 possible variables in the PNC training data, while the SA-RF selected 13 variables.The methods selected variables that are nearly the same especially for predicting PNC, while for the PM_2.5 models the SA-RF selected 16 variables and the GA-RF selected only 10 variables. The hybrid feature selection methods eliminated most of the correlated variables, especially the background pollutants and the traffic variables. Whereas the temporal variables and the meteorological variable have been selected in all the cases considered. The statistical performance of the linear models with the selected variables is similar to those developed using the entire predictor variables. The actual benefit derived from this study is the successful reduction in the number of predictor variables by more than half in most of the cases considered. The reduction in the number of variables will eventually result in the reduction of the operational and computational cost of the models without possibly compromising the predictive performance of the models. Also, the reduction in the number of variables will enhance interpretability.

Keywords:

air quality, genetic algorithms (GA), particulate matter, random forests (RF), simulated annealing (SA), statistical modelling

References

[1] Kuhn, M. & Johnson, K., Applied Predictive Modeling, Springer, 2013.

[2] LondonAir. (2013, 03/04/2013). London Air quality Network. Available: http://www.londonair.org.uk/london/asp/datadownload.asp

[3] Benas, N., Beloconi, A. & Chrysoulakis, N., Estimation of urban PM10 concentration, based on MODIS and MERIS/AATSR synergistic observations. Atmospheric Environment, 79, pp. 448–454, Nov 2013.

[4] Chen, Y. Y., Shi, R. H., Shu, S. J. & Gao, W., Ensemble and enhanced PM10 concentration forecast model based on stepwise regression and wavelet analysis. Atmospheric Environment, 74, pp. 346–359, Aug 2013.

[5] de Paula, P. H. M., Mateus, V. L., Araripe, D. R., Duyck, C. B., Saint’Pierre, T. D. & Gioda, A., Biomonitoring of metals for air pollution assessment using a hemiepiphyte herb (Struthanthus flexicaulis). Chemosphere, 138, pp. 429–437, Nov 2015.

[6] Deka, P., Bhuyan, P., Daimari, R., Sarma, K. P. & Hoque, R. R., Metallic species in PM10 and source apportionment using PCA-MLR modeling over mid-Brahmaputra Valley. Arabian Journal of Geosciences, 9, May 2016.

[7] Guo, X. Y., Li, C., Gao, Y., Tang, L., Briki, M., Ding, H. J., et al., Sources of organic matter (PAHs and n-alkanes) in PM2.5 of Beijing in haze weather analyzed by combining the C-N isotopic and PCA-MLR analyses. Environmental Science-Processes & Impacts, 18, pp. 314–322, 2016.

[8] He, H. D., Lu, W. Z., & Xue, Y. Prediction of particulate matters at urban intersection by using multilayer perceptron model based on principal components. Stochastic Environmental Research and Risk Assessment, 29, pp. 2107–2114, Dec 2015.

[9] James, G., Witten, D. & Hastie, T., An Introduction to Statistical Learning: With Applications in R. ed, 2014.

[10] Karamizadeh, S., Abdullah, S. M., Manaf, A. A., Zamani, M. & Hooman, A., An Overview of Principal Component Analysis. Journal of Signal and Information Processing, 4, p. 173, 2013.

[11] Singh, K. P., Gupta, S., Kumar, A. & Shukla, S. P. Linear and nonlinear modeling approaches for urban air quality prediction. Science of the Total Environment, 426, pp. 244–255, Jun 2012.

[12] Chen, Y., Shi, R., Shu, S. & Gao, W., Ensemble and enhanced PM10 concentration forecast model based on stepwise regression and wavelet analysis. Atmospheric Environment, 74, pp. 346–359, 8// 2013.

[13] Whittingham, M. J., Stephens, P. A., Bradbury, R. B. & Freckleton, R. P., Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75, pp. 1182–1189, 2006.

[14] Banerjee, T., Singh, S. B. & Srivastava, R. K., Development and performance evaluation of statistical models correlating air pollutants and meteorological variables at Pantnagar, India. Atmospheric Research, 99, pp. 505–517, Mar 2011.

[15] Brown, T., Dassonville, C., Derbez, M., Ramalho, O., Kirchner, S., Crump, D., et al., Relationships between socioeconomic and lifestyle factors and indoor air quality in French dwellings. Environmental Research, 140, pp. 385–396, 7// 2015.

[16] Diaz-de-Quijano, M., Joly, D., Gilbert, D. & Bernard, N., A more cost-effective geomatic approach to modelling PM10 dispersion across Europe. Applied Geography, 55, pp. 108–116, 12// 2014.

[17] Krivtsov, V., Howarth, M. J. & Jones, S. E., Characterising observed patterns of suspended particulate matter and relationships with oceanographic and meteorological variables: Studies in Liverpool Bay. Environmental Modelling & Software, 24, pp. 677–685, Jun 2009.

[18] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, pp. 301–320, 2005.

[19] Simons, K., De Smedt, T., Van Nieuwenhuyse, A., Buyl, R. & Coomans, D., Ensemble post-processing is a promising method to obtain flexible distributed lag models. Air Quality, Atmosphere & Health, pp. 1–12, 2016.

[20] Suleiman, A., Tight, M. R. & Quinn, A. D. Hybrid Neural Networks and Boosted Regression Tree Models for Predicting Roadside Particulate Matter. Environmental Modeling & Assessment, pp. 1–20, 2016.

[21] Fouskakis, D. & Draper, D., Stochastic optimization: a review. International Statistical Review, 70, pp. 315–349, 2002.

[22] Kuhn, M., The caret Package. 2012.

[23] R Development Core Team, “R 3.2. 1,” ed: R Project for Statistical Computing Vienna, Austria, 2015.

[24] Lin, S.-W., Tseng, T.-Y., Chou, S.-Y., & Chen, S.-C., A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks. Expert Systems with Applications, 34, pp. 1491–1499, 2// 2008.

[25] Breiman, L., Random forests. Machine learning, 45, pp. 5–32, 2001.

[26] Carslaw, D. C. & Ropkins, K., openair — An R package for air quality data analysis. Environmental Modelling & Software, 27–28, pp. 52–61, 2012.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Applying Hybrid Feature Selection Methods for Statistical Modelling of Roadside Particle Concentrations (PM2.5 and PNC)