PURPOSE: To identify, appraise, and synthesize the performance of artificial intelligence-based meibography reading as compared with human graders in diagnosing meibomian gland dysfunction. METHODS: We followed Cochrane methodology and reporting guidelines for diagnostic test accuracy reviews. To assess potential risk of bias and applicability, we used a modified Quality Assessment of Diagnostic Accuracy Studies-2 checklist. We applied bivariate logistic models to estimate summary sensitivity and specificity when appropriate and used the GRADE framework to rate the certainty of the evidence. RESULTS: We identified 14 eligible studies involving 5511 predominantly middle-aged participants (average age: 27-55 years) who were primarily female (≥54.5%). A total of 18,926 meibography images were obtained through noncontact infrared (11 studies) or in vivo confocal microscopy (three studies). Two studies reported external validation of deep learning models, 12 reported internally validated models, and one reported both. All but one study had high risk of bias in at least one domain; 12 studies raised high or intermediate concern about applicability. Based on three external evaluations, the summary sensitivity and specificity for diagnosing meibomian gland dysfunction from normal glands were 97.5% (95% confidence interval: 77.5%-99.8%) and 85.5% (95% confidence interval: 47.3%-97.5%). Sources of heterogeneity in internally validated models included study population, case mix, and others. The overall evidence was very low to low certainty because of imprecision, high risk of bias, and concerns about applicability. CONCLUSIONS: Artificial intelligence-based meibography grading appears less accurate than human graders. Future studies should adopt rigorous designs, including a more diverse participant pool (or image set), and external validation.

Artificial Intelligence for Diagnosing Meibomian Gland Dysfunction: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy Studies / Liu, Su-Hsun; Shah, Margi; Leslie, Louis; Lo, Jui-En; Ansah-Asiedu, Enoch; Harnke, Ben; Hauswirth, Scott G.; Virgili, Gianni; Li, Tianjing. - In: OPHTHALMIC AND PHYSIOLOGICAL OPTICS. - ISSN 0275-5408. - ELETTRONICO. - (2026), pp. 000-000. [10.1097/ICO.0000000000004151]

Artificial Intelligence for Diagnosing Meibomian Gland Dysfunction: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy Studies

Virgili, Gianni;
2026

Abstract

PURPOSE: To identify, appraise, and synthesize the performance of artificial intelligence-based meibography reading as compared with human graders in diagnosing meibomian gland dysfunction. METHODS: We followed Cochrane methodology and reporting guidelines for diagnostic test accuracy reviews. To assess potential risk of bias and applicability, we used a modified Quality Assessment of Diagnostic Accuracy Studies-2 checklist. We applied bivariate logistic models to estimate summary sensitivity and specificity when appropriate and used the GRADE framework to rate the certainty of the evidence. RESULTS: We identified 14 eligible studies involving 5511 predominantly middle-aged participants (average age: 27-55 years) who were primarily female (≥54.5%). A total of 18,926 meibography images were obtained through noncontact infrared (11 studies) or in vivo confocal microscopy (three studies). Two studies reported external validation of deep learning models, 12 reported internally validated models, and one reported both. All but one study had high risk of bias in at least one domain; 12 studies raised high or intermediate concern about applicability. Based on three external evaluations, the summary sensitivity and specificity for diagnosing meibomian gland dysfunction from normal glands were 97.5% (95% confidence interval: 77.5%-99.8%) and 85.5% (95% confidence interval: 47.3%-97.5%). Sources of heterogeneity in internally validated models included study population, case mix, and others. The overall evidence was very low to low certainty because of imprecision, high risk of bias, and concerns about applicability. CONCLUSIONS: Artificial intelligence-based meibography grading appears less accurate than human graders. Future studies should adopt rigorous designs, including a more diverse participant pool (or image set), and external validation.
2026
000
000
Liu, Su-Hsun; Shah, Margi; Leslie, Louis; Lo, Jui-En; Ansah-Asiedu, Enoch; Harnke, Ben; Hauswirth, Scott G.; Virgili, Gianni; Li, Tianjing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1470757
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact