Comments on: missing data http://hea-www.harvard.edu/AstroStat/slog/2008/missing-data/ Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 01 Jun 2012 18:47:52 +0000 hourly 1 http://wordpress.org/?v=3.4 By: Alex http://hea-www.harvard.edu/AstroStat/slog/2008/missing-data/comment-page-1/#comment-814 Alex Mon, 27 Oct 2008 22:47:01 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=359#comment-814 In the astrostats group, we usually use a data augmentation approach for problems of incompleteness. That is, we alternate between drawing from the distribution of the missing data given the last set of parameters drawn and drawing from the distributions of the parameters given the complete (observed + missing) data. Vinay is correct in that it does not fit into the previously mentioned categories very well; data augmentation is really a Bayesian procedure, whereas your lists are more classically focused (and less focused on simulation methods). Data augmentation is essentially multiple imputation, but we typically do not use that term. For a nice example of the data augmentation approach to handling missing data, I would take a look at Nondas's poster on the Log(N)-Log(S) estimation problem. It's a nice, relatively direct application of the procedure to a problem involving incompleteness. In the astrostats group, we usually use a data augmentation approach for problems of incompleteness. That is, we alternate between drawing from the distribution of the missing data given the last set of parameters drawn and drawing from the distributions of the parameters given the complete (observed + missing) data. Vinay is correct in that it does not fit into the previously mentioned categories very well; data augmentation is really a Bayesian procedure, whereas your lists are more classically focused (and less focused on simulation methods). Data augmentation is essentially multiple imputation, but we typically do not use that term.

For a nice example of the data augmentation approach to handling missing data, I would take a look at Nondas’s poster on the Log(N)-Log(S) estimation problem. It’s a nice, relatively direct application of the procedure to a problem involving incompleteness.

]]>
By: vlk http://hea-www.harvard.edu/AstroStat/slog/2008/missing-data/comment-page-1/#comment-812 vlk Mon, 27 Oct 2008 16:46:36 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=359#comment-812 Very useful summary, Hyunsook, lots of food for thought. A couple of comments from an astronomical perspective. We usually have data missing due to thresholding of some observable (e.g., intensity -- faint sources will be missed, see Eddington Bias), which seems to me to fall under none of the categories you mentioned. We also have MCAR in the time domain when an observation drops off because the object sets for the day, or because the telescope goes into Earth eclipse, etc. btw, wavdetect uses iterative mean imputation to determine the background under a source. Other wavelet-based astro detection algorithms (such as pwdetect) use zero imputation. (i.e., find outliers, excise them from the data, and replace with either mean of the surrounding, or with zero, and iterate until convergence.) Very useful summary, Hyunsook, lots of food for thought. A couple of comments from an astronomical perspective. We usually have data missing due to thresholding of some observable (e.g., intensity — faint sources will be missed, see Eddington Bias), which seems to me to fall under none of the categories you mentioned. We also have MCAR in the time domain when an observation drops off because the object sets for the day, or because the telescope goes into Earth eclipse, etc.

btw, wavdetect uses iterative mean imputation to determine the background under a source. Other wavelet-based astro detection algorithms (such as pwdetect) use zero imputation. (i.e., find outliers, excise them from the data, and replace with either mean of the surrounding, or with zero, and iterate until convergence.)

]]>