Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal eects. When data suer from non-overlap, estimation of these estimands requires reliance on model specications, due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings, where study results are often intended to in uence policy, population-level inference may be critical, and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of eects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal eects. We seek to make two contributions on this topic. First, we propose a exible, data-driven denition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal eects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach, the tasks of estimating causal eects in the overlap and non-overlap regions are delegated to two distinct models, suited to the degree of data support in each region. Tree ensembles are used to non-parametrically estimate individual causal eects in the overlap region, where the data can speak for themselves. In the non-overlap region, where insucient data support means reliance on model specication is necessary, individual causal eects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal eect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses, is available on Github (https://github.com/rachelnethery/overlap). Keywords: Overlap; Propensity Score; Bayesian Additive Regression Trees; Splines; Natural Gas; Cancer Mortality.

Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality / Rachel C. Nethery, Fabrizia Mealli, Francesca Dominici. - In: THE ANNALS OF APPLIED STATISTICS. - ISSN 1932-6157. - STAMPA. - 13:(2019), pp. 1242-1267. [10.1214/18-AOAS1231]

Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality

Fabrizia Mealli;Francesca Dominici
2019

Abstract

Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal eects. When data suer from non-overlap, estimation of these estimands requires reliance on model specications, due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings, where study results are often intended to in uence policy, population-level inference may be critical, and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of eects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal eects. We seek to make two contributions on this topic. First, we propose a exible, data-driven denition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal eects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach, the tasks of estimating causal eects in the overlap and non-overlap regions are delegated to two distinct models, suited to the degree of data support in each region. Tree ensembles are used to non-parametrically estimate individual causal eects in the overlap region, where the data can speak for themselves. In the non-overlap region, where insucient data support means reliance on model specication is necessary, individual causal eects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal eect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses, is available on Github (https://github.com/rachelnethery/overlap). Keywords: Overlap; Propensity Score; Bayesian Additive Regression Trees; Splines; Natural Gas; Cancer Mortality.
2019
13
1242
1267
Rachel C. Nethery, Fabrizia Mealli, Francesca Dominici
File in questo prodotto:
File Dimensione Formato  
1805.09736v2.pdf

accesso aperto

Descrizione: articolo
Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Open Access
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1147735
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 17
social impact