Propensity Score Matching (PSM) has become a popular approach to estimate causal effects. It relies on the assumption that selection into treatment can be explained purely in terms of observable characteristics (unconfoundedness assumption) and on the property that balancing on the propensity score is equivalent to balancing on the observed covariates. The PSM methodology is widely applied and empirical examples can be found in very diverse fields of study, such as those of the evaluation of labour market policies, the assessment of educational projects and the evaluation of the effect of demographic events on socioeconomic phenomena. Often, these applications show a hierarchical structure of the data, where units are clustered in groups (workers in local market areas, pupils in schools, and individuals in communities). The specific challenges for causal inference arising in this kind of setting remain greatly unaddressed in the literature. In this paper we focus on the issue of the specification of the propensity score where some relevant information at the cluster level is missing. Our focus is on large scale observational studies (e.g., national surveys) where the typical data structure, characterised by a relatively high number of small clusters (few observations per cluster), makes the implementation of the matching algorithm within clusters, which would solve the omitted variable problem, difficult. We distinguish between two assignment mechanisms that differ on the way cluster effects enter the selection into treatment process. In the first case, the cluster characteristics along with individual ones affect the selection probability. In the second situation, the selection process differs by cluster and belonging to a cluster instead than to another makes the individual probabilities to be selected to vary. This distinction is relevant not only conceptually but also for the practical implementation of the matching procedure. In fact, in the first case within-cluster matching is not needed since what we require is that matched treated and controls belong to similar clusters and not necessarily to the same cluster. In this paper we focus on this kind of setting exploring the use of multilevel models to estimate the propensity score in order to face, at least partially, the problem of the omission of relevant cluster level variables. We compare this approach with alternative ones, like a single level model with cluster dummies. By using Monte Carlo evidence we show that multilevel specifications usually allow to achieve a reasonably good balancing in cluster level unobserved covariates and consequently reduce the omitted variable bias. This is also the case for the dummy model. However, when the number of clusters is high and/or the cluster sizes are small this approach leads to less efficient estimates of the ATT.

The specification of the propensity score in multilevel observational studies / Bruno Arpino; Fabrizia Mealli. - (2009).

The specification of the propensity score in multilevel observational studies

Bruno Arpino;Fabrizia Mealli
2009

Abstract

Propensity Score Matching (PSM) has become a popular approach to estimate causal effects. It relies on the assumption that selection into treatment can be explained purely in terms of observable characteristics (unconfoundedness assumption) and on the property that balancing on the propensity score is equivalent to balancing on the observed covariates. The PSM methodology is widely applied and empirical examples can be found in very diverse fields of study, such as those of the evaluation of labour market policies, the assessment of educational projects and the evaluation of the effect of demographic events on socioeconomic phenomena. Often, these applications show a hierarchical structure of the data, where units are clustered in groups (workers in local market areas, pupils in schools, and individuals in communities). The specific challenges for causal inference arising in this kind of setting remain greatly unaddressed in the literature. In this paper we focus on the issue of the specification of the propensity score where some relevant information at the cluster level is missing. Our focus is on large scale observational studies (e.g., national surveys) where the typical data structure, characterised by a relatively high number of small clusters (few observations per cluster), makes the implementation of the matching algorithm within clusters, which would solve the omitted variable problem, difficult. We distinguish between two assignment mechanisms that differ on the way cluster effects enter the selection into treatment process. In the first case, the cluster characteristics along with individual ones affect the selection probability. In the second situation, the selection process differs by cluster and belonging to a cluster instead than to another makes the individual probabilities to be selected to vary. This distinction is relevant not only conceptually but also for the practical implementation of the matching procedure. In fact, in the first case within-cluster matching is not needed since what we require is that matched treated and controls belong to similar clusters and not necessarily to the same cluster. In this paper we focus on this kind of setting exploring the use of multilevel models to estimate the propensity score in order to face, at least partially, the problem of the omission of relevant cluster level variables. We compare this approach with alternative ones, like a single level model with cluster dummies. By using Monte Carlo evidence we show that multilevel specifications usually allow to achieve a reasonably good balancing in cluster level unobserved covariates and consequently reduce the omitted variable bias. This is also the case for the dummy model. However, when the number of clusters is high and/or the cluster sizes are small this approach leads to less efficient estimates of the ATT.
Bruno Arpino; Fabrizia Mealli
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1161451
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 99
social impact