Since 2000, the CDC has been planning and conducting a clinical trial, the Anthrax Vaccine Research Program (AVRP), to evaluate a reduced AVA schedule and a change in the route of administration in humans. The AVA trial is a 43-month prospective, randomized, double-blind, placebo-controlled trial for the comparison of immunogenicity (i.e., immunity) and reactogenicity (i.e., side effect) elicited by AVA given by different routes of administration and dosing regimens. The AVA study has been significant because, as a result of the interim analysis, the FDA approved the change in the route of AVA administration from SQ to IM. However, as with other complex experimental and observational data, the AVRP data creates various challenges for statistical evaluation. One such challenge is how to handle the missing data generated by dropouts, missed visits, and missing responses. The simplest complete data analysis that drops any subjects with missing data is not applicable here, because even though the overall missing rate is low at this time (3.4%), only 56 among the approximately 2,000 variables are fully observed and only 208 subjects have fully observed variables. Filling in missing values by copying the last recorded value for a subject on a particular variable (the “last observation carried forward” approach) likely is not a good idea in this situation, because side effects and immune response likely will vary over time and not remain constant. Randomly choosing a case with observed data to serve as a donor of values to a case with missing data (the “hot-deck” strategy) could be problematic due to the high degree of missing values and the need to express uncertainty after imputation. During the last two decades, multiple imputation (MI) has become a standard statistical technique for dealing with missing data. It has been further popularized by several software packages (e.g., PROC MI in SAS, IVEware, SOLAS, and MICE). MI generally involves specifying a joint distribution for all variables in a data set. The data model is often supplemented by a prior distribution for the model parameters in the Bayesian setting. Multiple imputations of the missing values are then created as random draws from the posterior predictive distribution of the missing data, given the observed data. MI has been successfully implemented in many large applications. Two such applications are described in “Filling in the Blanks: Some Guesses Are Better Than Others” and “Healthy for Life: Accounting for Transcription Errors Using Multiple Imputation,” both published in CHANCE, Vol. 21, No. 3. MI for the AVRP substantially increases the challenge, mostly due to the large number and different types of variables in the data set, the limited number of units within each treatment arm (Imputations should be done independently across treatment arms to avoid cross-contamination among groups.), and, most important, theoretical incompatibility in the imputation algorithms used by current available packages such as IVEware and MICE. Another important issue is how to evaluate the imputations, a question that has been largely neglected in most of the MI applications.
Multiple Imputation in the Anthrax Vaccine Research Program / Baccini M.; Cook S.; Frangakis C.E.; LI F.; Mealli F.; Rubin D. B.; Zell E.R. - In: CHANCE. - ISSN 0933-2480. - STAMPA. - 1:(2010), pp. 16-23.
Multiple Imputation in the Anthrax Vaccine Research Program
BACCINI, MICHELA;MEALLI, FABRIZIA;
2010
Abstract
Since 2000, the CDC has been planning and conducting a clinical trial, the Anthrax Vaccine Research Program (AVRP), to evaluate a reduced AVA schedule and a change in the route of administration in humans. The AVA trial is a 43-month prospective, randomized, double-blind, placebo-controlled trial for the comparison of immunogenicity (i.e., immunity) and reactogenicity (i.e., side effect) elicited by AVA given by different routes of administration and dosing regimens. The AVA study has been significant because, as a result of the interim analysis, the FDA approved the change in the route of AVA administration from SQ to IM. However, as with other complex experimental and observational data, the AVRP data creates various challenges for statistical evaluation. One such challenge is how to handle the missing data generated by dropouts, missed visits, and missing responses. The simplest complete data analysis that drops any subjects with missing data is not applicable here, because even though the overall missing rate is low at this time (3.4%), only 56 among the approximately 2,000 variables are fully observed and only 208 subjects have fully observed variables. Filling in missing values by copying the last recorded value for a subject on a particular variable (the “last observation carried forward” approach) likely is not a good idea in this situation, because side effects and immune response likely will vary over time and not remain constant. Randomly choosing a case with observed data to serve as a donor of values to a case with missing data (the “hot-deck” strategy) could be problematic due to the high degree of missing values and the need to express uncertainty after imputation. During the last two decades, multiple imputation (MI) has become a standard statistical technique for dealing with missing data. It has been further popularized by several software packages (e.g., PROC MI in SAS, IVEware, SOLAS, and MICE). MI generally involves specifying a joint distribution for all variables in a data set. The data model is often supplemented by a prior distribution for the model parameters in the Bayesian setting. Multiple imputations of the missing values are then created as random draws from the posterior predictive distribution of the missing data, given the observed data. MI has been successfully implemented in many large applications. Two such applications are described in “Filling in the Blanks: Some Guesses Are Better Than Others” and “Healthy for Life: Accounting for Transcription Errors Using Multiple Imputation,” both published in CHANCE, Vol. 21, No. 3. MI for the AVRP substantially increases the challenge, mostly due to the large number and different types of variables in the data set, the limited number of units within each treatment arm (Imputations should be done independently across treatment arms to avoid cross-contamination among groups.), and, most important, theoretical incompatibility in the imputation algorithms used by current available packages such as IVEware and MICE. Another important issue is how to evaluate the imputations, a question that has been largely neglected in most of the MI applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
chance-2010.pdf
Accesso chiuso
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Tutti i diritti riservati
Dimensione
831.08 kB
Formato
Adobe PDF
|
831.08 kB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



