An Imputation Method For Missing Covariate Data

Bocci, Chiara; Rocco, Emilia

The paper deals with the matter of applying a geoadditive model to produce estimates for some geographical domains in the absence of point referenced geographical data. Geoadditive models introduced by Kammann and Wand (2003), allow to analyze the spatial distribution of the study variable while accounting for possible linear or non-linear covariate effects by merging an additive model (Hastie and Tibshirani, 1990) and a kriging model (Cressie, 1993) and by expressing both as a linear mixed model. Therefore, when data are spatially located and explicit consideration is given to the possible importance of their spatial distribution in the analysis, geoadditive models represent a powerful geostatistical methodology. The model implementation needs the statistical units to be referenced at point locations and if we use them to produce model-based estimates of a parameter of interest for some geographical domains, the spatial location is required for all the population units. However often we don't know the exact location of all the population units, especially when socio-economic data are involved. Typically, we know the coordinates for sampled units (which could be specifically collected for the analysis), but we don't know the exact location of all the non-sampled population units. For the non-sampled units we know just the areas to which they belong like census districts, blocks, municipalities, etc. In such situation, the classic approach is to locate all the units belonging to the same area by the coordinates (latitude and longitude) of the geographical center or centroid of the area. This is obviously an approximation, induced by nothing but a geometrical property, and its effect on the estimates can be strong, depending on the level of nonlinearity in the spatial pattern and on the area dimension. In this paper we propose to fill the holes in the geographical information following a stochastic imputation approach instead of the classic deterministic one with the centroids. In particular, we suggest to treat the lack of geographical information imposing a distribution for the locations inside each area. This is realized through a hierarchical Bayesian formulation of the geoadditive model in which a prior distribution on the spatial coordinates is defined. The performance of our imputation approach is evaluated through various Markov Chain Monte Carlo (MCMC) experiments.

An Imputation Method For Missing Covariate Data / Bocci, Chiara; Rocco, Emilia. - ELETTRONICO. - (2012), pp. 4996-5001. (Intervento presentato al convegno 58th Congress Of International Statistical institute - ISI 2011 tenutosi a Dublino (Irlanda) nel 21-26 agosto 2011).