The growing use of artificial intelligence in education calls for rigorous methods to evaluate fairness in algorithmic decision-making. This paper focuses on calibration as an operationalization of the sufficiency criterion and develops a method to compute it in educational contexts, where prediction is often defined in terms of individual scores rather than a simple success or failure outcome. We first discuss the conceptual foundations of sufficiency and its specific relevance for predictions expressed as scores or probabilities. We then propose a procedure for measuring fairness through calibration, addressing key challenges such as the aggregation of outcomes across multiple predicted values, the ordering of groups in difference measures, and the treatment of cases where data availability is unbalanced across groups. The proposed procedure is grounded in design choices that aim to preserve the interpretive meaning of data in terms of fairness, while relying on established and transparent statistical methods. The method is empirically applied to two real-world student performance datasets using different classification algorithms. The results illustrate both the feasibility of the approach and the methodological implications of the design choices required to operationalize calibration. The contribution of this study lies primarily in providing a structured framework for measuring sufficiency through calibration, enabling researchers and practitioners to better assess fairness in artificial intelligence systems for education.
Operationalizing Calibration for Fair Educational Artificial Intelligence / M. Mancini, D. Merlini, M. C. Verri. - ELETTRONICO. - 16438:(2026), pp. 183-198. [10.1007/978-3-032-17604-2_17]
Operationalizing Calibration for Fair Educational Artificial Intelligence
M. Mancini
;D. Merlini;M. C. Verri
2026
Abstract
The growing use of artificial intelligence in education calls for rigorous methods to evaluate fairness in algorithmic decision-making. This paper focuses on calibration as an operationalization of the sufficiency criterion and develops a method to compute it in educational contexts, where prediction is often defined in terms of individual scores rather than a simple success or failure outcome. We first discuss the conceptual foundations of sufficiency and its specific relevance for predictions expressed as scores or probabilities. We then propose a procedure for measuring fairness through calibration, addressing key challenges such as the aggregation of outcomes across multiple predicted values, the ordering of groups in difference measures, and the treatment of cases where data availability is unbalanced across groups. The proposed procedure is grounded in design choices that aim to preserve the interpretive meaning of data in terms of fairness, while relying on established and transparent statistical methods. The method is empirically applied to two real-world student performance datasets using different classification algorithms. The results illustrate both the feasibility of the approach and the methodological implications of the design choices required to operationalize calibration. The contribution of this study lies primarily in providing a structured framework for measuring sufficiency through calibration, enabling researchers and practitioners to better assess fairness in artificial intelligence systems for education.| File | Dimensione | Formato | |
|---|---|---|---|
|
Wails_2025_camera_ready.pdf
accesso aperto
Descrizione: Operationalizing Calibration for Fair Educational Artificial Intelligence
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
426.71 kB
Formato
Adobe PDF
|
426.71 kB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



