Transformer-Based Encoder-Decoder Model For Enhanced Air Quality Prediction
DOI:
https://doi.org/10.70454/IJMRE.2025.50411Keywords:
Air quality, encoder-decoder, LSTM, deep-learning, Air Quality Prediction, Feature Selection, Environmental Monitoring, Attention Mechanism, Spatial-Temporal Analysis, Pollution Forecasting, PM2.5, Interpretable AI, Smart CityAbstract
Air quality prediction remains a critical challenge due to the complex spatiotemporal dependencies inherent in pollutant data. We propose a transformer-based encoder-decoder model to address this challenge, focusing on accurate and robust air quality index (AQI) forecasting. The proposed method processes historical pollutant measurements, including PM2.5, PM10, and NO2, through a multi-head self-attention mechanism to capture long-range dependencies and nonlinear interactions. The model employs a sliding window approach to generate sequential input-output pairs, which are then normalized and fed into stacked transformer encoders for feature extraction. A global average pooling layer condenses the temporal information into a fixed-length representation, enabling precise AQI prediction through a dense output layer. The architecture incorporates residual connections and layer normalization to stabilize training, while dropout regularization mitigates overfitting. Experiments on the Delhi air quality dataset demonstrate the model’s effectiveness, achieving competitive performance in terms of mean squared error and mean absolute error. Furthermore, the transformer’s ability to model intricate temporal patterns without recurrent structures offers computational advantages over traditional sequence models. The results highlight the potential of attention-based architectures for environmental monitoring tasks, particularly in scenarios where interpretability and scalability are paramount. This work contributes to the growing body of research on deep learning for air quality prediction, providing a framework that balances accuracy, efficiency, and generalizability
References
[1] GEP Box & DA Pierce (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association.
[2] MA Hearst, ST Dumais, E Osuna, J Platt, et al. (1998) Support vector machines. In Proceedings of the 1999 IEEE International Conference on Systems, Man, and Cybernetics.
[3] A Graves (2012) Long short-term memory. Supervised Sequence Labelling With Recurrent Neural Networks.
[4] R Dey & FM Salem (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In Midwest Symposium on Circuits and Systems.
[5] A Vaswani, N Shazeer, N Parmar, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems.
[6] A Verma, V Ranga & DK Vishwakarma (2024) BREATH-Net: a novel deep learning framework for NO2 prediction using bi-directional encoder with transformer. Environmental Monitoring And Assessment.
[7] SL Velusamy & VM Shanmugam (2025) Leveraging pretrained transformers for enhanced air quality index prediction model. Bulletin of Electrical Engineering and Informatics.
[8] M Awad & R Khanna (2015) Support vector regression. Efficient Learning Machines: Theories, Concepts, And Applications For Engineers And System Designers.
[9] L Breiman (2001) Random forests. Machine learning.
[10] A Graves, N Jaitly & A Mohamed (2013) Hybrid speech recognition with deep bidirectional LSTM. In 2013 Ieee Workshop On Automatic Speech Recognition And Understanding.
[11] X Shi, Z Chen, H Wang, DY Yeung, et al. (2015) Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems.
[12] S Lakshmi & A Krishnamoorthy (2024) Effective Multi-Step PM2. 5 and PM10 Air Quality Forecasting Using Bidirectional ConvLSTM Encoder-Decoder With STA Mechanism. IEEE Access.
[13] R Rana & N Kumar (2024) Smart Air: A Spatiotemporal Attention Based Deep Learning Approach for Accurate PM2.5 and PM10 Forecasting. Earth Systems and Environment.
[14] AS Mohan & L Abraham (2024) An ensemble deep learning approach for air quality estimation in Delhi, India. Earth Science Informatics.
[15] J Dong, Y Zhang & J Hu (2024) Short-term air quality prediction based on EMD-transformer-BiLSTM. Scientific Reports.
[16] K Sekaran, M Priyadharshini, et al. (2024) AirTFT: A Novel Transformer-Based Approach for Air Quality Prediction. Please check the URL https://ieeexplore.ieee.org/abstract/document/10898497/ to get the complete publication venue as it can’t be retrieved without direct access..
[17] V Rai, S Kumar, T Singh, et al. (2023) PM2. 5 level forecasting using transformer-based model. In 2023 3rd International Conference On Power Electronics, Intelligent Computing And Systems.
[18] Z Dai, Z Yang, Y Yang, JG Carbonell, Q Le, et al. (2019) Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
[19] K He, X Zhang, S Ren & J Sun (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
[20] JL Ba, JR Kiros & GE Hinton (2016) Layer normalization. arXiv preprint arXiv:1607.06450.
[21] B Lim, SÖ Arık, N Loeff & T Pfister (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. International journal of forecasting.
[22] Y Wang, A Mohamed, D Le, C Liu, et al. (2020) Transformer-based acoustic modeling for hybrid speech recognition. ICASSP.
[23] H Zhou, S Zhang, J Peng, S Zhang, J Li, et al. (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the Association for the Advancement of Artificial Intelligence.
[24] H Wu, J Xu, J Wang & M Long (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems.
[25] Y Tay, M Dehghani, S Abnar, Y Shen, D Bahri, et al. (2020) Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006.
[26] Y Zhang & J Yan (2023) Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference On Learning Representations.
[27] S Chauhan, ZB Patel, S Ranu, et al. (2023) Airdelhi: Fine-grained spatio-temporal particulate matter dataset from delhi for ml based modeling. In Advances in Neural Information Processing Systems.
[28] EM Roberts (1979) Review of statistics of extreme values with applications to air quality data: part II. Applications. Journal of the Air Pollution Control Association.
[29] SJ Hadeed, MK O’rourke, JL Burgess, RB Harris, et al. (2020) Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of the Total Environment.
[30] M Rahimi, R Pon, WJ Kaiser, et al. (2004) Adaptive sampling for environmental robotics. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation.
[31] P Sharma, M Khare & SP Chakrabarti (1999) Application of extreme value theory for predicting violations of air quality standards for an urban road intersection. Transportation Research Part D: Transport and Environment.
[32] V Oliveira Santos, PA Costa Rocha, J Scott, et al. (2023) Spatiotemporal Air pollution forecasting in Houston-TX: a case study for ozone using deep graph neural networks. Atmosphere.
[33] M Bakirci (2024) Smart city air quality management through leveraging drones for precision monitoring. Sustainable Cities and Society.
[34] TB Fang & Y Lu (2012) Personal real-time air pollution exposure assessment methods promoted by information technological advances. Annals of GIS.
[35] A Tapashetti, D Vegiraju, et al. (2016) IoT-enabled air quality monitoring device: A low cost smart health solution. In 2016 IEEE Global Humanitarian Technology Conference.
[36] C Mullen, A Flores, S Grineski & T Collins (2022) Exploring the distributional environmental justice implications of an air quality monitoring network in Los Angeles County. Environmental Research.
[37] L Oneto & S Chiappa (2020) Fairness in machine learning. Tutorials From The Inns Big Data And Deep Learning.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Hemant Kumar Pandey, Dr. Kaneez Zainab (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an Open Access article distributed under the term's of the Creative Common Attribution 4.0 International License permitting all use, distribution, and reproduction in any medium, provided the work is properly cited.
