This study aimed at developing an integrated metabolomics and machine learning (ML) framework to identify robust chemical markers of Tuscan VOO authenticity while minimizing the confounding influence of milling. A multi-platform analytical workflow, combining 1H NMR, HS-SPME GC–MS, HPLC-DAD-FLD and sensory evaluation, was applied to 40 VOOs from seven cultivars (Frantoio, Leccino, Leccio del Corno, Moraiolo, Seggianese, Morcone della Valtiberina, Canino) and blends. Quantitative variables generated by the three analytical platforms were integrated into a single matrix to train ML models, including Random Forest (RF), TreeNet gradient boosting and Multivariate Adaptive Regression Splines. A two-stage RF procedure was implemented, first removing variables dominated by mill-related effects (13% of total chemical variance), then re-modelling the filtered dataset to identify mill-independent geographical markers. In summary, this approach enabled: (i) accurate geographical classification (≈80% cross-validated accuracy), with the most informative markers comprising secoiridoids, sterols, triterpenoids, aldehydes, terpenoids, and lipid-derived metabolites; (ii) the prediction of sensory attributes by linoleic and linolenic acids and their lipoxygenase-derived C6 aldehydes and alcohols, through ML models with high predictive performance (R² = 0.95 artichoke, 0.92 fruity, 0.90 tomato-like, 0.83 resinous, 0.93 heated, 0.95 rancid); iii) identification of specific molecules, notably margaric acid, strongly associated with the olive cultivars examined. These results highlight links between cultivar traits, lipid precursors and aroma expression. This study introduces the first multi-platform ML minimizing mill-related confounding, improving the reliability of VOO geographical authentication and sensory prediction, and laying the groundwork for future applications in quality assessment and regional characterization.

Machine learning supported olive oil compound profiling for assessing geographic and cultivar authenticity / Meoni, Gaia; Vita, Chiara; Tenori, Leonardo; Venturini, Lorenzo; Ascolese, Miriam; Mattei, Alissa; Tacconi, Lucia; Ceccherini, Maria Teresa; Luchinat, Claudio; Tommasini, Simone; Moretti, Sandro; Pelacani, Samuel. - In: FOOD RESEARCH INTERNATIONAL. - ISSN 0963-9969. - ELETTRONICO. - 233:(2026), pp. Part 2.0-Part 2.0. [10.1016/j.foodres.2026.118890]

Machine learning supported olive oil compound profiling for assessing geographic and cultivar authenticity

Meoni, Gaia;Tenori, Leonardo;Ascolese, Miriam;Mattei, Alissa;Ceccherini, Maria Teresa
;
Luchinat, Claudio;Tommasini, Simone;Moretti, Sandro;Pelacani, Samuel
2026

Abstract

This study aimed at developing an integrated metabolomics and machine learning (ML) framework to identify robust chemical markers of Tuscan VOO authenticity while minimizing the confounding influence of milling. A multi-platform analytical workflow, combining 1H NMR, HS-SPME GC–MS, HPLC-DAD-FLD and sensory evaluation, was applied to 40 VOOs from seven cultivars (Frantoio, Leccino, Leccio del Corno, Moraiolo, Seggianese, Morcone della Valtiberina, Canino) and blends. Quantitative variables generated by the three analytical platforms were integrated into a single matrix to train ML models, including Random Forest (RF), TreeNet gradient boosting and Multivariate Adaptive Regression Splines. A two-stage RF procedure was implemented, first removing variables dominated by mill-related effects (13% of total chemical variance), then re-modelling the filtered dataset to identify mill-independent geographical markers. In summary, this approach enabled: (i) accurate geographical classification (≈80% cross-validated accuracy), with the most informative markers comprising secoiridoids, sterols, triterpenoids, aldehydes, terpenoids, and lipid-derived metabolites; (ii) the prediction of sensory attributes by linoleic and linolenic acids and their lipoxygenase-derived C6 aldehydes and alcohols, through ML models with high predictive performance (R² = 0.95 artichoke, 0.92 fruity, 0.90 tomato-like, 0.83 resinous, 0.93 heated, 0.95 rancid); iii) identification of specific molecules, notably margaric acid, strongly associated with the olive cultivars examined. These results highlight links between cultivar traits, lipid precursors and aroma expression. This study introduces the first multi-platform ML minimizing mill-related confounding, improving the reliability of VOO geographical authentication and sensory prediction, and laying the groundwork for future applications in quality assessment and regional characterization.
2026
233
0
0
Meoni, Gaia; Vita, Chiara; Tenori, Leonardo; Venturini, Lorenzo; Ascolese, Miriam; Mattei, Alissa; Tacconi, Lucia; Ceccherini, Maria Teresa; Luchinat,...espandi
File in questo prodotto:
File Dimensione Formato  
Meoni_Ceccherini_2026_Machine_learning_olive_oil.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Open Access
Dimensione 15.07 MB
Formato Adobe PDF
15.07 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1462132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact