This study presents an improved methodology to define lithology-specific geochemical baseline values in geologically complex areas with a long history of mining activity, such as the eastern sector of the Mt. Amiata volcanic complex (Tuscany, Italy). Establishing accurate baselines is crucial to distinguish natural from anthropogenic inputs of potentially toxic elements in soil and to support regulatory decisions and remediation strategies. The proposed approach combines Compositional Data Analysis with the Random Forest algorithm to achieve robust lithological classification and to minimize the influence of outliers. More than 300 topsoil and subsoil samples were analyzed for mineralogical composition and Hg, As, Sb, Cr, Cu, Co, V, and Ni. The statistical analysis identified the main compositional patterns, allowing a reliable estimation of geochemical baselines across the volcanic and sedimentary domains. The baseline values were computed using robust statistics (median ± 2MAD) for Hg, As, and Sb, which showed strong lithological control, whereas those of Cr, Cu, Co, V, and Ni were not calculated, being similar or lower than the background levels. The resulting baselines highlighted the variability linked to the geological setting, providing useful insights into the stability and resilience of geochemically distinct domains, distinguishing natural anomalies from legacy contamination. Some baseline values exceeded the Italian legal thresholds (e.g., Hg in sedimentary subsoils was up to 15.6 mg/kg, while As in volcanic soils was up to 85.7 mg/kg), emphasizing the need for site-specific regulatory limits. The proposed workflow can be extended to other mining-affected regions to better characterize natural background conditions.
Integrating Compositional Data Analysis (CoDA) and Random Forest for lithology-specific geochemical baseline determination / Meloni, Federica; Gozzi, Caterina; Cabassi, Jacopo; Nisi, Barbara; Rappuoli, Daniele; Vaselli, Orlando. - In: SCIENCE OF THE TOTAL ENVIRONMENT. - ISSN 0048-9697. - ELETTRONICO. - 1011:(2026), pp. 181169.1-181169.16. [10.1016/j.scitotenv.2025.181169]
Integrating Compositional Data Analysis (CoDA) and Random Forest for lithology-specific geochemical baseline determination
Meloni, Federica
;Gozzi, Caterina;Cabassi, Jacopo;Nisi, Barbara;Rappuoli, Daniele;Vaselli, Orlando
2026
Abstract
This study presents an improved methodology to define lithology-specific geochemical baseline values in geologically complex areas with a long history of mining activity, such as the eastern sector of the Mt. Amiata volcanic complex (Tuscany, Italy). Establishing accurate baselines is crucial to distinguish natural from anthropogenic inputs of potentially toxic elements in soil and to support regulatory decisions and remediation strategies. The proposed approach combines Compositional Data Analysis with the Random Forest algorithm to achieve robust lithological classification and to minimize the influence of outliers. More than 300 topsoil and subsoil samples were analyzed for mineralogical composition and Hg, As, Sb, Cr, Cu, Co, V, and Ni. The statistical analysis identified the main compositional patterns, allowing a reliable estimation of geochemical baselines across the volcanic and sedimentary domains. The baseline values were computed using robust statistics (median ± 2MAD) for Hg, As, and Sb, which showed strong lithological control, whereas those of Cr, Cu, Co, V, and Ni were not calculated, being similar or lower than the background levels. The resulting baselines highlighted the variability linked to the geological setting, providing useful insights into the stability and resilience of geochemically distinct domains, distinguishing natural anomalies from legacy contamination. Some baseline values exceeded the Italian legal thresholds (e.g., Hg in sedimentary subsoils was up to 15.6 mg/kg, while As in volcanic soils was up to 85.7 mg/kg), emphasizing the need for site-specific regulatory limits. The proposed workflow can be extended to other mining-affected regions to better characterize natural background conditions.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



