Machine learning for the estimation of the propensity score: a simulation study

Arpino, B; Cannas, M

Despite the extensive literature on propensity score methods there are still several open questions for their implementation. Based on the results of an extensive simulation exercise, we try to address some of these questions and provide guidelines for applicants. The first question we consider is which method should be preferred to estimate the propensity score. We compare machine learning algorithms (MLA) with standard logit model by analyzing the performance of the different PS estimators in matching (PSM) and weighting (PSW) via MonteCarlo simulations. Second, we profit of the simulation framework to assess the efficacy of several measures of covariate balance in predicting the quality of the propensity score weighting and matching estimators. With few exceptions we found that weighting estimators outperform matching estimators in all simulation scenarios in terms of bias reduction. Both for PSM and PSW random forests, followed by logit, gave the lowest bias while tree methods were competitive only when weighting and neural networks and naive bayes only with large data sets. The balance diagnostics with the highest association with the bias was the average standardized difference in covariates (ASAM) with the inclusion of interaction terms but the association was not substantially different from that of classic ASAM. Less commonly used metrics (auc, var ratio) resulted only weakly associated to the bias.

Machine learning for the estimation of the propensity score: a simulation study / Arpino B; Cannas M. - ELETTRONICO. - (2016), pp. 0-0. (Intervento presentato al convegno 48th Scientific Meeting of the Italian Statistical Societiy tenutosi a Salerno nel 8-10 giugno 2016).