I'm at Salt Lake City for Joint Statistical Meeting (JSM). By accident (I wanted to go Don Rubin's Causal Inference talk at the same time), I was listening a speaker whose work is motivated by an astronomer, interested in regression and clustering on SDSS data. Sadly, he only applied well known classical statistics on simulated bivariate data. In astronomy, I personally believe that the behavior of simulated data and the actual data is quite different, partly because the uncertainty comes during the calibration procedure. This uncertainty is hard to be modeled from a simple probabilistic theory. Another challenge is the computational time of those methods that the speaker introduced. The model based clustering or the k-mean requires iterative computation. With hundreds of millions objects, I become suspicious about their feasibility.
