Development and machine-learning-based calibration of a low-cost multiparametric station for the measurement of CO2, CH4, (or H2S and SO2) in the air: an innovative approach for investigating the impact on air quality of natural and anthropogenic contaminant sources

Biagi, R.; Venturi, S.; Ferrari, M.; Montegrossi, G.; Sacco, M.; Frezzi, F.; Tassi, F.

Atmospheric pollutants have a harmful impact on human health, ecosystems, and public infrastructures. Most anthropogenic and natural environments, e.g., urban areas and hydrothermal manifestations, emit toxic mixtures of greenhouse gases (GHGs) and sulfur volatile species into the air. Among GHGs, CO2 and CH4 are climate forcers of major concern, while sulfur species (e.g., H2S and SO2) are toxic gases that may contribute to acid rain. Therefore, moving towards reliable, affordable and high-density air pollution measurements is a key issue. However, the high costs of air quality stations set-up and maintenance make applying the traditional instruments at multiple sites unfeasible, resulting in scarce-density resolution of data both in time and space. This study presents the development of a low-cost station prototype that houses (i) a non-dispersive infrared sensor for CO2 concentration, (ii) a solid-state metal oxide sensor for CH4 concentrations, or alternatively electrochemical sensors for H2S and SO2, and (iii) sensors for temperature and relative humidity of the air. The main issue of this low-cost approach regards the in-field accuracy of the sensors, which significantly depends on (i) cross-sensitivities to other atmospheric pollutants, (ii) environmental parameters, and (iii) detector stability over time. An in-field machine learning-based calibration method has been developed for CO2, CH4, H2S, and SO2 sensors, applying the Linear Random Forest (LRF) regression. The calibration model was built based on measurements carried out using, in parallel, the low-cost sensors and two reference instruments (i.e., a Cavity Ring-Down Spectroscopy analyzer for CO2 and CH4, and a Pulsed Fluorescence analyzer for H2S and SO2). The raw concentrations (raw_conc) of these compounds recorded by the sensors, together with measured air temperature (T), and relative humidity (RH), were assigned as features of the models: y = f (raw_conc, T, RH). The dataset consisted of measurements performed in different environments and seasons, to collect a wide variety of concentrations and ambient conditions used to train the calibration model, according to the following strategy: 70% of the measurements were dedicated to training the model, 15% for validating it, and 15% for testing the accuracy of predictions. The LRF regression model showed excellent performance in predicting CO2 and CH4 concentrations, with R2 values on test data of 0.9978 and 0.9260, and mean absolute errors of 1.34 and 0.014, respectively. H2S and SO2 predictions displayed some criticalities, possibly related to the scarce sensitivity of the sensors at low concentrations. To overcome this issue, further investigations may be focused on a calibration model that includes the CO2/H2S and CO2/SO2 ratios. The encouraging results gained for the carbon species lay the basis for integrating the station of sensors for monitoring other contaminants (e.g., PM, NOx, CO, etc.) to be calibrated with the same procedure.

Development and machine-learning-based calibration of a low-cost multiparametric station for the measurement of CO2, CH4, (or H2S and SO2) in the air: an innovative approach for investigating the impact on air quality of natural and anthropogenic contaminant sources / Biagi R., Venturi S., Ferrari M., Montegrossi G., Sacco M., Frezzi F., Tassi F.. - ELETTRONICO. - (2023), pp. 0-0. (Intervento presentato al convegno Congresso congiunto SIMP, SGI, SOGEI, AIV "The Geoscience paradigm: Resources, Risks and future perspectives").