The AstroStat Slog » chisq

chi-square distribution [Eqn]

vlk — Wed, 16 Jul 2008 17:00:16 +0000

The Χ² distribution plays an incredibly important role in astronomical data analysis, but it is pretty much a black box to most astronomers. How many people know, for instance, that its form is exactly the same as the γ distribution? A Χ² distribution with ν degrees of freedom is

p(z|ν) = (1/Γ(ν/2)) (1/2)^ν/2 z^ν/2-1 e^-z/2 ≡ γ(z;ν/2,1/2) , where z=Χ².

Its more familiar usage is in the cumulative form, which is just the incomplete gamma function. This is where you count off how much area is enclosed in [0,Χ²) to tell at what point the 68%, 95%, etc., thresholds are met. For example, for ν=1,

∫₀^Z dx p(Χ²|ν=1) = 0.68 when Z=1.

This is the origin of the ΔΧ²=1 method to determine error bars on best-fit parameters.

Mmm.. donuts

vlk — Wed, 22 Aug 2007 17:01:06 +0000

Mmm.. chi-square!

The withering criticisms Hyunsook has been directing towards the faulty use of chisquare by astronomers brings to mind this classic comment by [astronomer] Jeremy Drake during the 2005 Chandra Calibration Workshop:

Remember that classic Far Side cartoon of the dog being admonished and only hearing its name? Well, that’s what happens when statisticians admonish astronomers. They might say “don’t use chisq”, and all we here is “chisq”.

Update [6/23/2010]: We had the modified cartoons here to illustrate the issue a bit more figuratively, but apparently Gary Larson doesn’t like that, so we have removed them.

All your bias are belong to us

vlk — Mon, 04 Jun 2007 21:42:38 +0000

Leccardi & Molendi (2007) have a paper in A&A (astro-ph/0705.4199) discussing the biases in parameter estimation when spectral fitting is confronted with low counts data. Not surprisingly, they find that the bias is higher for lower counts, for standard chisq compared to C-stat, for grouped data compared to ungrouped. Peter Freeman talked about something like this at the 2003 X-ray Astronomy School at Wallops Island (pdf1, pdf2), and no doubt part of the problem also has to do with the (un)reliability of the fitting process when the chisq surface gets complicated.

Anyway, they propose an empirical method to reduce the bias by computing the probability distribution functions (pdfs) for various simulations, and then averaging the pdfs in groups of 3. Seems to work, for reasons that escape me completely.

[Update: links to Peter's slides corrected]

On the unreliability of fitting

vlk — Fri, 25 May 2007 20:30:34 +0000

Despite some recent significant advances in Statistics and its applications to Astronomy (Cash 1976, Cash 1979, Gehrels 1984, Schmitt 1985, Isobe et al. 1986, van Dyk et al. 2001, Protassov et al. 2002, etc.), there still exist numerous problems and limitations in the standard statistical methodologies that are routinely applied to astrophysical data. For instance, the basic algorithms used in non-linear curve-fitting in spectra and images have remained unchanged since the 1960′s: the downhill simplex method of Nelder & Mead (1965) modified by Powell, and methods of steepest descent exemplified by Levenberg-Marquardt (Marquardt 1963). All non-linear curve-fitting programs currently in general use (Sherpa, XSPEC, MPFIT, PINTofALE, etc.) with the exception of Monte Carlo and MCMC methods are implementations based on these algorithms and thus share their limitations.

It has long been known that non-linear curve-fitting, for all its apparent objectivity and the statistical theoretical foundation that enables us to evaluate the goodness of fits, is still something of an art. Leaving aside issues such as a correct specification of the asymmetrical errors that are usually prevalent in astrophysical situations (mostly due to the underlying Poisson distribution), fitting a model curve to the data is never a simple and straightforward process. More often than not, the parameters returned by the programs as those that “best fit” the data are not those that truly maximize the statistical likelihood, and are caused by the programs being trapped in local minima of the chi^2 surface. Furthermore, numerical errors also creep in to the outputs based on how stopping rules are implemented: cases abound where repeated applications of the fitting program simply to move the “best-fit” parameters closer to the true best-fit values but do not ever reach it. These problems are compounded when the number of parameters in the model are large (>10), and there are significant correlations between the parameters.

A typical strategy adopted to ensure that the fitting process does result in the best possible fit, is to ensure that the fitting process begins with parameters that are very close to the true best-fit values. This is done by first carrying out the fitting in multiple stages, with only a few parameters fit at a time, holding the others fixed, until the parameter values returned by the program show that the solutions have converged. Only then is a fit to all the parameters attempted. Then, the errors on each parameter are computed by taking projections cutting through the chi^2 surface along each parameter axis, and if the best-fit values do not change during this process, then one can be confident that the best possible fit has been obtained.

What this means in practice is that studies that carry out automated fits to a large number of data sets where there is no possibility of human intervention, and thus no means to verify that the best fit parameters have been found, are unreliable. As an example, we show in the Figure above the case where a Gaussian model was fit 100 times to profiles generated by a Poisson with a mean mu=1000 (the Poisson profile is indistinguishable from a Gaussian for mu>~100, cf. Park et al. 2006) and with a total normalization of 1000. In all cases, the fits were started with initial values at the nominal best-fit values. The Figure shows a scatter plot of the initial value of the chi^2, obtained at the end of the first fit attempt, versus the final chi^2, obtained after going through the process described above. Changes as large as delta chi^2 approx 25 are seen, which changes the character of the fit significantly.