A Machine Learning approach to the classification of chemo-structural determinants in label-free SERS detection of proteins

Barucci, A; D'Andrea, C; Farnesi, E; Banchelli, M; Amicucci, C; De Angelis, M; Marzi, C; Pini, R; Hwang, B; Matteini, P

doi:10.1109/ICOP56156.2022.9911735

Establishing standardized methods for a consistent analysis of spectral data remains a largely underexplored aspect in surface-enhanced Raman spectroscopy (SERS), particularly applied to biological and bio-medical research. Here we propose a Machine Learning (ML) based approach for classification of protein species. Principal Component Analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) where used for dimensionality reduction, along with supervised and unsupervised methods to quantify how closely resembled SERS spectral profiles belonging to different species (Albumin from bovine serum, Albumin from human serum, Lysozyme, Human holo-transferrin, Human apo-transferrin) are. In particular, ML algorithms such as Support Vector Machine, K-Nearest Neighbours, Linear Discriminant Analysis and the unsupervised K-means were applied to original and multipeak fitting on SERS spectra respectively. This strategy simultaneously assures a fast, full and successful discrimination of proteins and a thorough characterization of the chemo-structural differences among them, ultimately opening up new routes for SERS evolution toward sensing applications and diagnostics of interest in life sciences.

A Machine Learning approach to the classification of chemo-structural determinants in label-free SERS detection of proteins / Barucci, A; D'Andrea, C; Farnesi, E; Banchelli, M; Amicucci, C; De Angelis, M; Marzi, C; Pini, R; Hwang, B; Matteini, P. - STAMPA. - (2022), pp. 1-4. ( Italian conference on optics and photonics, ICOP 2022) [10.1109/ICOP56156.2022.9911735].