Comments on: Astrostatistics: Goodness-of-Fit and All That!

By: Joseph Hilbe

Joseph Hilbe — Sat, 14 Mar 2009 21:32:06 +0000

Everyone interested in astrostatistics is invited to attend the Astrostatistics Interest Group business and papers meetings at the International Statistical Institute’s binannual meetings from August 16-22, 2009. We are scheculed to have the business/organizational meeting on August 21 at the Durban, South Africa Convention Center at 11:15AM. If interested please contact me at either hilbe -at- asu.edu or j.m.hilbe -at- gmail.com.

Joseph Hilbe
Chair, ISI Astrostatistics Interest Group
JPL/CalTech and Arizona State University

By: Gracie

Gracie — Wed, 19 Nov 2008 17:15:18 +0000

“If one wants to summarize data (extract information through statistics), one should use correct statistics.” Agree. DOn’t compare apples to oranges.

By: hlee

hlee — Mon, 01 Oct 2007 18:07:17 +0000

Dr. Babu, Director of Center for Astrostatistics kindly sent a comment on his paper for the slog:

Chi^2 only looks at binned data, so in the process some information is lost. Where as K-S test looks at the whole curve. We are essentially estimating the whole distribution function. The confidence band in the K-S test refers to the unknown distribution, not to the parameter.

By: hlee

hlee — Mon, 20 Aug 2007 02:34:41 +0000

The K-S tests were build upon the null hypothesis, F_n(x)=F(x), not on \hat{\theta}=\theta_o (parameter). The confidence band is laid upon the empirical distribution, if I put this in a simple word, a confidence band on the estimated p-value. I didn't see that they included any parameter estimation and its confidence band.

The χ^2 method provides a best fit based on least square methods. If the errors of the response variable are normally distributed, then this least square method provides an equal solution to the maximum likelihood estimator. Because of Wald (1949) and other series of works, we know that this ML estimator is consistent and asymptotically normal, so that the χ^2 number +1 could lead a 68% confidence interval (one parameter). This interval is only valid when the ML estimator is consistent. Protossov et.al. (2001) showed some of the regularity conditions on the model to reach this consistent estimator. I'm not sure all astronomical models satisfy these conditions to adopt the χ^2 method to get the desired confidence interval. Many suffer from identifiability, the type of parameter space (compact, open, closed), continuity, or differentibility.

Beyond the least square and maximum likelihood methods, there are numerous ways to obtain a best fit based on the conditions of the target model. Although I took strong words to say your statement is not fair, if you say those physical models satisfy regularity conditions, or empirically the chosen model is physically always right, or the χ^2 number is absolutely right due to physics, I have no reason to object the traditional practice of χ^2.

However, I want to make a point that parameter estimation and hypothesis testing (including confidence interval) are better to be separate. I don't think it is suitable unless you have a very strong evidence of consistency to use your estimated parameter value for testing (or confidence interval) based upon the χ method. You may use restricted maximum likelihood (REML) estimator if your know the range (in general, many observations and parameters should be positive). You can penalize the χ^2. You can use the best linear unbiased estimator (BLUE, the least square method equivalent for simple linear models). You can use M-estimators, and so on. By showing these methods produce a consistent estimator to the true (unknown) parameter value, then the modified χ^2 (modification due to penalty, change of norms, restriction on parameter space, etc) leads to a confidence interval for the parameter of interest.

I know I beat around the bush. The cited paper does not provide an absolute solution, but it shows how to improve the current practice of the goodness-of-fit tests and why the bootstrap works on improving the K-S tests. The more complicated (physical) models become, the more careful choice of statistics is required, which I consider has been ignored in the astronomical society. If one wants to summarize data (extract information through statistics), one should use correct statistics. This didn't happen with 100% success rate. I hope the rate goes up, close to 100%.

[Comment: Regularity conditions, consistency, asymptotics, and other arcane terminology make astronomers driven away from classical statistics and pursue Bayesian statistics as long as physical models exist.]

By: vlk

vlk — Fri, 17 Aug 2007 23:27:53 +0000

OK, I see that they replace the usual goodness-of-fit with the Kullback-Leibler distance measure. But otherwise, intention or not, Babu & Feigelson do say in the abstract that they "combine K-S statistics with bootstrap resampling to achive unbiased parameter confidence bands." That reads to me as recalibrating their preferred statistic on the fly. Would you please therefore explain why my summary was not fair? On the separate question of using the chi^2, please note that I am not suggesting the use of the standard chisq distribution, but the calculation of chi^2, the number, and a recalibration of chi^2, the distribution, the same way as B&F do for the K-S statistic. What is wrong with that?

By: hlee

hlee — Fri, 17 Aug 2007 22:40:00 +0000

First, the K-S test is a distribution free test. Second, χ^2 methods produce results from an approximation based on the χ^2 distribution. To satisfy such approximation, data and models have to meet the regularity conditions, which in general are never discussed in astronomical papers. To avoid the checking procedure for the regularity conditions, it's convenient to use a distribution free test. Yet, this distribution free tests are not always efficient.

Another point I like to make is these distribution free tests are just tests (Ho: your model is a good fit vs Ha: not). Parameter estimation is another process. It is well known that using the data for estimating parameters and using the same data for testing introduces bias. The astronomers' χ^2 methods haven't been paying attention to this bias. The way that confidence interval is constructed from the χ^2 is built on no-bias regime.

In conclusion, the summary is not fair. Your summary was not the authors' intention.

By: vlk

vlk — Fri, 17 Aug 2007 22:22:49 +0000

Is it a fair summary of their paper to say that “you can use K-S, C-vM, and A-D to do parameter estimation and goodness-of-fit, provided that you recalibrate the statistic used in each case with parametric or non-parametric bootstrap”? If so, surely you do the same with chi^2? What is the specific advantage of K-S, etc., over chi^2?