This study aimed at developing an integrated metabolomics and machine learning (ML) framework to identify robust chemical markers of Tuscan VOO authenticity while minimizing the confounding influence of milling. A multi-platform analytical workflow, combining 1H NMR, HS-SPME GC–MS, HPLC-DAD-FLD and sensory evaluation, was applied to 40 VOOs from seven cultivars (Frantoio, Leccino, Leccio del Corno, Moraiolo, Seggianese, Morcone della Valtiberina, Canino) and blends. Quantitative variables generated by the three analytical platforms were integrated into a single matrix to train ML models, including Random Forest (RF), TreeNet gradient boosting and Multivariate Adaptive Regression Splines. A two-stage RF procedure was implemented, first removing variables dominated by mill-related effects (13% of total chemical variance), then re-modelling the filtered dataset to identify mill-independent geographical markers. In summary, this approach enabled: (i) accurate geographical classification (≈80% cross-validated accuracy), with the most informative markers comprising secoiridoids, sterols, triterpenoids, aldehydes, terpenoids, and lipid-derived metabolites; (ii) the prediction of sensory attributes by linoleic and linolenic acids and their lipoxygenase-derived C6 aldehydes and alcohols, through ML models with high predictive performance (R² = 0.95 artichoke, 0.92 fruity, 0.90 tomato-like, 0.83 resinous, 0.93 heated, 0.95 rancid); iii) identification of specific molecules, notably margaric acid, strongly associated with the olive cultivars examined. These results highlight links between cultivar traits, lipid precursors and aroma expression. This study introduces the first multi-platform ML minimizing mill-related confounding, improving the reliability of VOO geographical authentication and sensory prediction, and laying the groundwork for future applications in quality assessment and regional characterization.
Machine learning supported olive oil compound profiling for assessing geographic and cultivar authenticity / Meoni, Gaia; Vita, Chiara; Tenori, Leonardo; Venturini, Lorenzo; Ascolese, Miriam; Mattei, Alissa; Tacconi, Lucia; Ceccherini, Maria Teresa; Luchinat, Claudio; Tommasini, Simone; Moretti, Sandro; Pelacani, Samuel. - In: FOOD RESEARCH INTERNATIONAL. - ISSN 0963-9969. - ELETTRONICO. - 233:(2026), pp. Part 2.0-Part 2.0. [10.1016/j.foodres.2026.118890]
Machine learning supported olive oil compound profiling for assessing geographic and cultivar authenticity
Meoni, Gaia;Tenori, Leonardo;Ascolese, Miriam;Mattei, Alissa;Ceccherini, Maria Teresa
;Luchinat, Claudio;Tommasini, Simone;Moretti, Sandro;Pelacani, Samuel
2026
Abstract
This study aimed at developing an integrated metabolomics and machine learning (ML) framework to identify robust chemical markers of Tuscan VOO authenticity while minimizing the confounding influence of milling. A multi-platform analytical workflow, combining 1H NMR, HS-SPME GC–MS, HPLC-DAD-FLD and sensory evaluation, was applied to 40 VOOs from seven cultivars (Frantoio, Leccino, Leccio del Corno, Moraiolo, Seggianese, Morcone della Valtiberina, Canino) and blends. Quantitative variables generated by the three analytical platforms were integrated into a single matrix to train ML models, including Random Forest (RF), TreeNet gradient boosting and Multivariate Adaptive Regression Splines. A two-stage RF procedure was implemented, first removing variables dominated by mill-related effects (13% of total chemical variance), then re-modelling the filtered dataset to identify mill-independent geographical markers. In summary, this approach enabled: (i) accurate geographical classification (≈80% cross-validated accuracy), with the most informative markers comprising secoiridoids, sterols, triterpenoids, aldehydes, terpenoids, and lipid-derived metabolites; (ii) the prediction of sensory attributes by linoleic and linolenic acids and their lipoxygenase-derived C6 aldehydes and alcohols, through ML models with high predictive performance (R² = 0.95 artichoke, 0.92 fruity, 0.90 tomato-like, 0.83 resinous, 0.93 heated, 0.95 rancid); iii) identification of specific molecules, notably margaric acid, strongly associated with the olive cultivars examined. These results highlight links between cultivar traits, lipid precursors and aroma expression. This study introduces the first multi-platform ML minimizing mill-related confounding, improving the reliability of VOO geographical authentication and sensory prediction, and laying the groundwork for future applications in quality assessment and regional characterization.| File | Dimensione | Formato | |
|---|---|---|---|
|
Meoni_Ceccherini_2026_Machine_learning_olive_oil.pdf
accesso aperto
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
15.07 MB
Formato
Adobe PDF
|
15.07 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



