Comments on: [ArXiv] SDSS DR6, July 23, 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-sdss-dr6/ Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 01 Jun 2012 18:47:52 +0000 hourly 1 http://wordpress.org/?v=3.4 By: hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-sdss-dr6/comment-page-1/#comment-58 hlee Tue, 31 Jul 2007 19:16:26 +0000 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-sdss-dr6-july-23-2007/#comment-58 I'm at Salt Lake City for Joint Statistical Meeting (JSM). By accident (I wanted to go Don Rubin's Causal Inference talk at the same time), I was listening a speaker whose work is motivated by an astronomer, interested in regression and clustering on SDSS data. Sadly, he only applied well known classical statistics on simulated bivariate data. In astronomy, I personally believe that the behavior of simulated data and the actual data is quite different, partly because the uncertainty comes during the calibration procedure. This uncertainty is hard to be modeled from a simple probabilistic theory. Another challenge is the computational time of those methods that the speaker introduced. The model based clustering or the k-mean requires iterative computation. With hundreds of millions objects, I become suspicious about their feasibility. I’m at Salt Lake City for Joint Statistical Meeting (JSM). By accident (I wanted to go Don Rubin’s Causal Inference talk at the same time), I was listening a speaker whose work is motivated by an astronomer, interested in regression and clustering on SDSS data. Sadly, he only applied well known classical statistics on simulated bivariate data. In astronomy, I personally believe that the behavior of simulated data and the actual data is quite different, partly because the uncertainty comes during the calibration procedure. This uncertainty is hard to be modeled from a simple probabilistic theory. Another challenge is the computational time of those methods that the speaker introduced. The model based clustering or the k-mean requires iterative computation. With hundreds of millions objects, I become suspicious about their feasibility.

]]>