Geostatistics is concerned with the problem of producing a map of a quantity of interest over a particular geographical region based on, usually noisy, measurement taken at a set of locations in the region. The aim of such a map is to describe and analyze the geographical pattern of the phenomenon of interest. Geostatistical methodologies are born and apply in areas such as environmental studies and epidemiology, where the spatial information is traditionally recorded and available. However, in the last years the diffusion of spatially detailed statistical data is considerably increased and these kind of procedures - possibly with appropriate modifications - can be used as well in other fields of application, for example to study demographic and socio-economic characteristics of a population living in a certain region. Basically, to obtain a surface estimate we can exploit the exact knowledge of the spatial coordinates (latitude and longitude) of the studied phenomenon by using bivariate smoothing techniques, such as kernel estimate or kriging (Cressie, 1993; Ruppert et al., 2003). However, usually the spatial information alone does not properly explain the pattern of the response variable and we need to introduce some covariates in a more complex model. Geoadditive models, introduced by Kammann and Wand (2003), answer this problem as they analyze the spatial distribution of the study variable while accounting for possible non-linear covariate effects. They represent such effects by merging an additive model (Hastie and Tibshirani, 1990) - that accounts for the non-linear relationship between the variables - and a kriging model - that accounts for the spatial correlation - and by expressing both as a linear mixed model. The linear mixed model representation is a useful instrument because it allows estimation using mixed model methodology and software. Moreover, we can extend geoadditive model to include generalized responses, small area estimation, longitudinal data, missing data and so on (Ruppert et al., 2009). A first aim of this work was to present the application of geoadditive models in fields that differ from environmental and epidemiological studies. In particular, a geoadditive small area estimation model is applied in order to estimate the mean of household log per-capita consumption expenditure for the Albanian Republic at district level. As we said, the geographical information is now more available in socio-economic data. However sometimes we don’t know the exact location of all the population units, just the areas to which they belong - like census districts, blocks, municipalities, etc - while we know the coordinates for sampled units. How can we continue to use the geoadditive model under these circumstances? The classic approach is to locate all the units belonging to the same area by the coordinates (latitude and longitude) of the area center. This is obviously an approximation, induced by nothing but a geometrical property, and its effect on the estimates can be strong and increases with the area dimension. We decided to proceed differently, treating the lack of geographical information as a particular problem of measurement error : instead of use the same coordinates for all the units, we impose a distribution for the locations inside each area. To analyze the performance of this approach, various MCMC experiments are implemented with different scenarios: missing variable (univariate and bivariate), distribution (uniform and beta) and data (simulated and real). The results show that, with the right hypothesis, the estimates under the measurement error assumption are better than that under the classic approach.

Geoadditive Models for Data with Spatial Information / C. Bocci. - (2010).

Geoadditive Models for Data with Spatial Information

BOCCI, CHIARA
2010

Abstract

Geostatistics is concerned with the problem of producing a map of a quantity of interest over a particular geographical region based on, usually noisy, measurement taken at a set of locations in the region. The aim of such a map is to describe and analyze the geographical pattern of the phenomenon of interest. Geostatistical methodologies are born and apply in areas such as environmental studies and epidemiology, where the spatial information is traditionally recorded and available. However, in the last years the diffusion of spatially detailed statistical data is considerably increased and these kind of procedures - possibly with appropriate modifications - can be used as well in other fields of application, for example to study demographic and socio-economic characteristics of a population living in a certain region. Basically, to obtain a surface estimate we can exploit the exact knowledge of the spatial coordinates (latitude and longitude) of the studied phenomenon by using bivariate smoothing techniques, such as kernel estimate or kriging (Cressie, 1993; Ruppert et al., 2003). However, usually the spatial information alone does not properly explain the pattern of the response variable and we need to introduce some covariates in a more complex model. Geoadditive models, introduced by Kammann and Wand (2003), answer this problem as they analyze the spatial distribution of the study variable while accounting for possible non-linear covariate effects. They represent such effects by merging an additive model (Hastie and Tibshirani, 1990) - that accounts for the non-linear relationship between the variables - and a kriging model - that accounts for the spatial correlation - and by expressing both as a linear mixed model. The linear mixed model representation is a useful instrument because it allows estimation using mixed model methodology and software. Moreover, we can extend geoadditive model to include generalized responses, small area estimation, longitudinal data, missing data and so on (Ruppert et al., 2009). A first aim of this work was to present the application of geoadditive models in fields that differ from environmental and epidemiological studies. In particular, a geoadditive small area estimation model is applied in order to estimate the mean of household log per-capita consumption expenditure for the Albanian Republic at district level. As we said, the geographical information is now more available in socio-economic data. However sometimes we don’t know the exact location of all the population units, just the areas to which they belong - like census districts, blocks, municipalities, etc - while we know the coordinates for sampled units. How can we continue to use the geoadditive model under these circumstances? The classic approach is to locate all the units belonging to the same area by the coordinates (latitude and longitude) of the area center. This is obviously an approximation, induced by nothing but a geometrical property, and its effect on the estimates can be strong and increases with the area dimension. We decided to proceed differently, treating the lack of geographical information as a particular problem of measurement error : instead of use the same coordinates for all the units, we impose a distribution for the locations inside each area. To analyze the performance of this approach, various MCMC experiments are implemented with different scenarios: missing variable (univariate and bivariate), distribution (uniform and beta) and data (simulated and real). The results show that, with the right hypothesis, the estimates under the measurement error assumption are better than that under the classic approach.
2010
Alessandra Petrucci
ITALIA
C. Bocci
File in questo prodotto:
File Dimensione Formato  
thesis_bocci.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 1.99 MB
Formato Adobe PDF
1.99 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/547657
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact