Joseph Hilbe

Chair, ISI Astrostatistics Interest Group

JPL/CalTech and Arizona State University

]]>Chi^2 only looks at binned data, so in the process some information is lost.

Where as K-S test looks at the whole curve. We are essentially

estimating the whole distribution function. The confidence band in

the K-S test refers to the unknown distribution, not to the parameter.

The χ^2 method provides a best fit based on least square methods. If the errors of the response variable are normally distributed, then this least square method provides an equal solution to the maximum likelihood estimator. Because of Wald (1949) and other series of works, we know that this ML estimator is consistent and asymptotically normal, so that the χ^2 number +1 could lead a 68% confidence interval (one parameter). This interval is only valid when the ML estimator is consistent. Protossov et.al. (2001) showed some of the regularity conditions on the model to reach this consistent estimator. I’m not sure all astronomical models satisfy these conditions to adopt the χ^2 method to get the desired confidence interval. Many suffer from identifiability, the type of parameter space (compact, open, closed), continuity, or differentibility.

Beyond the least square and maximum likelihood methods, there are numerous ways to obtain a best fit based on the conditions of the target model. Although I took strong words to say your statement is not fair, if you say those physical models satisfy regularity conditions, or empirically the chosen model is physically always right, or the χ^2 number is absolutely right due to physics, I have no reason to object the traditional practice of χ^2.

However, I want to make a point that parameter estimation and hypothesis testing (including confidence interval) are better to be separate. I don’t think it is suitable unless you have a very strong evidence of consistency to use your estimated parameter value for testing (or confidence interval) based upon the χ method. You may use restricted maximum likelihood (REML) estimator if your know the range (in general, many observations and parameters should be positive). You can penalize the χ^2. You can use the best linear unbiased estimator (BLUE, the least square method equivalent for simple linear models). You can use M-estimators, and so on. By showing these methods produce a consistent estimator to the true (unknown) parameter value, then the modified χ^2 (modification due to penalty, change of norms, restriction on parameter space, etc) leads to a confidence interval for the parameter of interest.

I know I beat around the bush. The cited paper does not provide an absolute solution, but it shows how to improve the current practice of the goodness-of-fit tests and why the bootstrap works on improving the K-S tests. The more complicated (physical) models become, the more careful choice of statistics is required, which I consider has been ignored in the astronomical society. If one wants to summarize data (extract information through statistics), one should use correct statistics. This didn’t happen with 100% success rate. I hope the rate goes up, close to 100%.

[Comment: Regularity conditions, consistency, asymptotics, and other arcane terminology make astronomers driven away from classical statistics and pursue Bayesian statistics as long as physical models exist.]

]]>On the separate question of using the chi^2, please note that I am not suggesting the use of the standard chisq *distribution*, but the calculation of chi^2, *the number*, and a recalibration of chi^2, *the distribution*, the same way as B&F do for the K-S statistic. What is wrong with that?

Another point I like to make is these distribution free tests are just tests (Ho: your model is a good fit vs Ha: not). Parameter estimation is another process. It is well known that using the data for estimating parameters and using the same data for testing introduces bias. The astronomers’ χ^2 methods haven’t been paying attention to this bias. The way that confidence interval is constructed from the χ^2 is built on no-bias regime.

In conclusion, the summary is not fair. Your summary was not the authors’ intention.

]]>