The AstroStat Slog » chi-square minimization http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 a century ago http://hea-www.harvard.edu/AstroStat/slog/2009/a-century-ago/ http://hea-www.harvard.edu/AstroStat/slog/2009/a-century-ago/#comments Thu, 07 May 2009 19:22:37 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=2470 Almost 100 years ago, A.S. Eddington stated in his book Stellar Movements (1914) that

…in calculating the mean error of a series of observations it is preferable to use the simple mean residual irrespective of sign rather than the mean square residual

Such eminent astronomer said already least absolute deviation over chi-square, if I match simple mean residual and mean square residual to relevant methodologies, in order.

I guess there is a reason everything is done based on the chi-square although a small fraction of the astronomy community is aware of that the chi-square minimization is not the only utility function for finding best fits. The assumption that the residuals “(Observed – Expected)/sigma”, as written in chi-square methods, are (asymptotically) normal – Gaussian, is freely taken into account by astronomical data (astronomers who analyze these data) mainly because of their high volume. The worst case is that even if checking procedures for the Gaussianity are available from statistical literature, applying those procedures to astronomical data is either difficult or ignored. Anyway, if one is sure that the data/parameters of interest are sampled from normal distribution, Eddington’s statement is better to be reverted because of sufficiency. We also know the asymptotic efficiency of sample standard deviation when the probability density function satisfies more general regularity conditions than the Gaussian density.

As a statistician, it is easy to say, “assume that data are iid standard normal, wlog.” And then, develop a statistic, investigate its characteristics, and compare it with other statistics. If this statistics does not show promising results from the comparison and strictly suffers from this normality assumption, then statisticians will attempt to make this statistic robust by checking and relaxing assumptions and math. On the other hand, I’m not sure how much astronomers feel easy with this Gaussianity assumption in their data most of which are married to statistics or statistical tools based on the normal assumption. How often have the efforts of devising the statistic and trying different statistics been taken place?

Without imposing the Gaussianity assumption, I think that Eddington’s comment is extremely insightful. Commonly cited statistical methods in astronomy, like chi square methods, are built on Gaussianity assumption from which sample standard deviation is used for σ, the scale parameter of the normal distribution that is mapped to 68% coverage and multiple of the sample standard deviation correspond to well known percentiles as given in Numerical Recipes. In the end, I think statistical analysis in astronomy literature suffers from a dilemma, “which came first, the chicken or the egg?” On the other hand, I feel setback because such a insightful comment from one of the most renown astrophysicists didn’t gain much weight after many decades. My understanding that Eddington’s suggestion was ignored is acquired from reading only a fraction of publications in major astronomical journals; therefore, I might be wrong. Probably, astronomers use LAD and do robust inferences more often that I think.

Unless not sure about the Gaussianity in data (covering the sampling distribution, residuals between observed and expected, and some transformations), for inference problems, sample standard deviation may not be appropriate to get error bars with matching coverage. Estimating posterior distributions is a well received approach among some astronomers and there are good tutorials and textbooks about Bayesian data analysis for astronomers. Those familiar with basics of statistics, pyMC and its tutorial (or another link from python.org) will be very useful for proper statistical inference. If Bayesian computation sounds too cumbersome, for the simplicity, follow Eddington’s advice. Instead of sample standard deviation, use absolute mean deviation (simple mean residual, Eddington’s words) to quantify uncertainty. Perhaps, one wants to compare best fits and error bars from both strategies.

——————————————————
This story was inspired by Studies in the Hisotry of Probability and Statistics. XXXII: Laplace, Fisher, and the discovery of the concept of sufficiency by Stigler (1973) Biometrika v. 60(3), p.439. The quote of Eddington was adapted from this article. Another quote from this article I like to share:

Fisher went on to observe that this property of σ2[1] is quite dependent on the assumption that the population is normal, and showed that indeed σ1[2] is preferable to σ2, at least in large samples, for estimating the scale parameter of the double exponential distribution, providing both estimators are appropriately rescaled

By assuming that each observations is normally (Gaussian) distributed with mean (mu) and variance (sigma^2), and that the object was to estimate sigma, Fisher proved that the sample standard deviation (or mean square residual) is more efficient than the mean deviation form the sample mean (or simple mean residual). Laplace proved it as well. The catch is that assumptions come first, not the sample standard deviation for estimating error (or sigma) of unknown distribution.

  1. sample standard deviation
  2. mean deviation from the sample mean
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/a-century-ago/feed/ 0
4754 d.f. http://hea-www.harvard.edu/AstroStat/slog/2009/4754-df/ http://hea-www.harvard.edu/AstroStat/slog/2009/4754-df/#comments Tue, 17 Mar 2009 19:37:44 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1840 I couldn’t believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I’ve often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom.

When I was learning statistics, I never confronted such huge degrees of freedom. Well, given the facts that only a small amount of time is used for learning the chi-square goodness-of-fit test, that the chi-square distribution is a subset of gamma distribution, and that statisticians do not handle a hundred of thousands (there are more low count spectra but I’ll discuss why I chose this big number later) of photons from X-ray telescopes, almost surely no statistician would confront such huge degrees of freedom.

Degrees of freedom in spectral fitting are combined results of binning (or grouping into n classes) and the number of free parameters (p), i.e. n-p-1. Those parameters of interest, targets to be optimized or to be sought for solutions are from physical source models, which are determined by law of physics. Nothing to be discussed from the statistical point of view about these source models except the model selection and assessment side, which seems to be almost unexplored area. On the other hand, I’d like to know more about binning and subsequent degrees of freedom.

A few binning schemes in spectral analysis that I often see are each bin having more than 25 counts (the same notion of 30 in statistics for CLT or the last number in a t-table) or counts in each bin satisfying a certain signal to noise ratio S/N level. For the latter, it is equivalent that sqrt(expected counts) is larger than the given S/N level since photon counts are Poisson distributed. There are more sophisticated adaptive binning strategies but I haven’t found mathematical, statistical, nor computational algorithmic justifications for those. They look empirical procedures to me that are discovered after many trials and errors on particular types of spectra (I often become suspicious if I can reproduce the same goodness of fit results with the same ObsIDs as reported in those publications). The point is that either simple or complex, at the end, if someone has a data file with large number of photons, n is generally larger than observations with sparse photons. This is the reason I happen to see inconceivable d.f.s to a statistician from some papers, like 4754.

First, the chi-square goodness of fit test was designed for agricultural data (or biology considering Pearson’s eugenics) where the sample size is not a scale of scores of thousands. Please, note that bin in astronomy is called cell (class, interval, partition) in statistical papers and books showing applications of chi-square goodness fit tests.

I also like to point out that the chi-square goodness of fit test is different from the chi-square minimization even if they share the same equation. The former is for hypothesis testing and the latter is for optimization (best fit solution). Using the same data for optimization and testing introduces bias. That’s one of the reasons why with large number of data points, cross validation techniques are employed in statistics and machine learning[1]. Since I consider binning as smoothing, the optimal number of bins and their size depends on data quality and source model property as is done in kernel density estimation or imminently various versions of chi-square tests or distance based nonparametric tests (K-S test, for example).

Although published many decades ago, you might want to check this paper out to get a proper rule of thumb for the number of bins:
“On the choice of the number of class intervals in the application of the chi square test” (JSTOR link) by Mann and Wald in The Annals of Mathematical Statistics, Vol. 13, No. 3 (Sep., 1942), pp. 306-317 where they showed that the number of classes is proportional to N^(2/5) (The underlying idea about the chi-square goodness of fit tests, detailed derivation, and exact equation about the number of classes is given in detail) and this is the reason why I chose a spectrum of 10^5 photons at the beginning. By ignoring other factors in the equation, 10^5 counts roughly yields 100 bins. About 4000 bins implies more than a billion photons, which seems a unthinkable number in X-ray spectral analysis. Furthermore, many reports said Mann and Wald’s criterion results in too many bins and loss of powers. So, n is subject to be smaller than 100 for 10^5 photons.

The other issue with statistical analysis on X-ray spectra is that although photons in each channel/bin can be treated as independent sample but the expected numbers of photons across bins are related via physical source model or so called link function borrowed from generalized linear model. However, well studied link functions in statistics do not match source models in high energy astrophysics. Typically, source models are not analytical. They are non-linear, numerical, tabulated, or black box type that are incompatible with current link functions in generalized linear model that is a well developed, diverse, and robust subject in statistics for inference problems. Therefore, binning data and chi-square minimization seems to be an only strategy for statistical inference about parameters in source models so far (for some “specific” statistical or physical models, this is not true, which is not a topic of this discussion). Mann and Wald’s method for class size assumes equiprobable bins whereas channel or bin probabilities in astronomy would not satisfy the condition. The probability vector of multinomial distribution depends on binning, detector sensitivity, and source model instead of the equiprobable constraint from statistics. Well, it is hard to device an purely statistically optimal binning/grouping method for X-ray spectral analysis.

Instead of individual group/bin dependent smoothing (S/N>3 grouping, for example), I, nevertheless, wish for developing binning/grouping schemes based on total sample size N particularly when N is large. I’m afraid that with the current chi-square test embedded in data analysis packages, the power of a chi-square statistic is so small and one will always have a good reduced chi-square value (astronomers’ simple model assessment tool: the measure of chi-square statistic divided by degrees of freedom and its expected value is one. If the reduced chi-square criterion is close to one, then the chosen source model and solution for parameters is considered to be best fit model and value). The fundamental idea of suitable number of bins is equivalent to optimal bandwidth problems in kernel density estimation, of which objective is accentuating the information via smoothing; therefore, methodology developed in the field of kernel density estimation may suggest how to bin/group the spectrum while preserving the most of information and increasing the efficiency. A modified strategy for binning and applying the chi-square test statistic for assessing model adequacy should be conceived instead of reporting thousands of degrees of freedom.

I think I must quit before getting too bored. Only I’d like to mention quite interesting papers that cited Mann and Wald (1942) and explored the chi square goodness of fit including Johnson’s A Bayesian chi-square test for Goodness-of-Fit (a link is made to the arxiv pdf file) which might provide more charm to astronomers who like to modify their chi-square methods in a Bayesian way. A chapter “On the Use and Misuse of Chi-Square” (link to google book excerpt) by KL Delucchi in A Handbook for Data Analysis in the Behavioral Sciences (1993) reads quite intriguing although the discussion is a reminder for behavior scientists.

Lastly, I’m very sure that astronomers explored properties of the chi-square statistic and chi-square type tests with their data sets. I admit that I didn’t make an expedition for such works since those are few needles in a mound of haystack. I’ll be very delighted to see an astronomers’ version of “use and misuse of chi-square,” a statistical account for whether the chi-square test with huge degrees of freedom is powerful enough, or any advice on that matter will be very much appreciated.

  1. a rough sketch of cross validation: assign data into a training data set and a test set. get the bet fit from the training set and evaluate the goodness-of-fit with that best fit with the test set. alternate training and test sets and repeat. wiki:cross_validationa
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/4754-df/feed/ 2
[ArXiv] Particle Physics http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-particle-physics/ http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-particle-physics/#comments Fri, 20 Feb 2009 23:48:39 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1234

[stat.AP:0811.1663]
Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription.

Descriptions of various particles from modern particle physics are briefly given (I like such brevity, conciseness, but delivering necessaries. If you want more on physics, find those famous bestselling books like The first three minutes, A brief history of time, The elegant universe, or Feynman’s and undergraduate textbooks of modern physics and of particle physics). Large Hardron Collider (LHC, hereafter. LHC related slog postings: LHC first beam, The Banff challenge, Quote of the week, Phystat – LHC 2008) is introduced on top of its statistical challenges from the data collecting/processing perspectives since it is expected to collect 1010 events. Visit LHC website to find more about LHC.

My one line summary of the article is solving particle physics problems from the hypothesis testing or rather broadly classical statistical inference approaches. I enjoyed the most reading section 5 and 6, particularly the subsection titled Why 5σ? Here are some excerpts I like to share with you from the article:

It is hoped that the approaches mentioned in this article will be interesting or outrageous enough to provoke some Statisticians either to collaborate with Particle Physicists, or to provide them with suggestions for improving their analyses. It is to be noted that the techniques described are simply those used by Particle Physicists; no claim is made that they are necessarily optimal (Personally, I like such openness and candidness.).

… because we really do consider that our data are representative as samples drawn according to the model we are using (decay time distributions often are exponential; the counts in repeated time intervals do follow a Poisson distribution, etc.), and hence we want to use a statistical approach that allows the data “to speak for themselves,” rather than our analysis being dominated by our assumptions and beliefs, as embodied in Bayesian priors.

Because experimental detectors are so expensive to construct, the time-scale over which they are built and operated is so long, and they have to operate under harsh radiation conditions, great care is devoted to their design and construction. This differs from the traditional statistical approach for the design of agricultural tests of different fertilisers, but instead starts with a list of physics issues which the experiment hopes to address. The idea is to design a detector which will proved answers to the physics questions, subject to the constraints imposed by the cost of the planned detectors, their physical and mechanical limitations, and perhaps also the limited available space. (Personal belief is that what segregates physical science from other science requiring statistical thinking is that uncontrolled circumstances are quite common in physics and astronomy whereas various statistical methodologies are developed under assumptions of controllable circumstances, traceable subjects, and collectible additional sample.)

…that nothing was found, it is more useful to quote an upper limit on the sought-for effect, as this could be useful in ruling out some theories.

… the nuisance parameters arise from the uncertainties in the background rate b and the acceptance ε. These uncertainties are usually quoted as σb and σε, and the question arises of what these errors mean. … they would express the width of the Bayesian posterior or of the frequentist interval obtained for the nuisance parameter. … they may involve Monte Carlo simulations, which have systematic uncertainties as well as statistical errors …

Particle physicists usually convert p into the number of standard deviation σ of a Gaussian distribution, beyond which the one-sided tail area corresponds to p. Thus, 5σ corresponds to a p-value of 3e-7. This is done simple because it provides a number which is easier to remember, and not because Guassians are relevant for every situation.
Unfortunately, p-values are often misinterpreted as the probability of the theory being true, given the data. It sometimes helps colleagues clarify the difference between p(A|B) and p(B|A) by reminding them that the probability of being pregnant, given the fact that you are female, is considerable smaller than the probability of being female, given the fact that you are pregnant.

… the situation is much less clear for nuisance parameters, where error estimates may be less rigorous, and their distribution is often assumed to be Gaussian (or truncated Gaussain) by default. The effect of these uncertainties on very small p-values needs to be investigated case-by-case.
We also have to remember that p-values merely test the null hypothesis. A more sensitive way to look for new physics is via the likelihood ratio or the differences in χ2 for the two hypotheses, that is, with and without the new effect. Thus, a very small p-value on its own is usually not enough to make a convincing case for discovery.

If we are in the asymptotic regime, and if the hypotheses are nested, and if the extra parameters of the larger hypothesis are defined under the samller one, and in that case do not lie on the boundary of their allowed region, then the difference in χ2 should itself be distributed as a χ2, with the number of degrees of freedom equal to the number of extra parameters (I’ve seen many papers in astronomy not minding (ignoring) these warnings for the likelihood ratio tests)

The standard method loved by Particle Physicists (astronomers alike) is χ2. This, however, is only applicable to binned data (i.e., in a one or more dimensional histogram). Furthermore, it loses its attractive feature that its distribution is model independent when there are not enough data, which is likely to be so in the multi-dimensional case. (High energy astrophysicists deal low count data on multi-dimensional parameter space; the total number of bins are larger than the number of parameters but to me, binning/grouping seems to be done aggressively to meet the good S/N so that the detail information about the parameters from the data gets lost. ).

…, the σi are supposed to be the true accuracies of the measurements. Often, all that we have available are estimates of their values (I also noticed astronomers confuse between true σ and estimated σ). Problems arise in situations where the error estimate depends on the measured value a (parameter of interest). For example, in counting experiments with Poisson statistics, it is typical to set the error as the square root of the observd number. Then a downward fluctuation in the observation results in an overestimated weight, and abest-fit is biased downward. If instead the error is estimated as the square root of the expected number a, the combined result is biased upward – the increased error reduces S at large a. (I think astronomers are aware of this problem but haven’t taken actions yet to rectify the issue. Unfortunately not all astronomers take the problem seriously and some blindly apply 3*sqrt(N) as a threshold for the 99.7 % (two sided) or 99.9% (one sided) coverage.)

Background estimation, particularly when observed n is less tan the expected background b is discussed in the context of upper limits derived from both statistical streams – Bayesian and frequentist. The statistical focus from particle physicists’ concern is classical statistical inference problems like hypothesis testing or estimating confidence intervals (it is not necessary that these intervals are closed) under extreme physical circumstances. The author discusses various approaches with modern touches of both statistical disciplines to tackle how to obtain upper limits with statistically meaningful and allocatable quantification.

As described, many physicists endeavor on a grand challenge of finding a new particle but this challenge is put concisely from the statistically perspectives like p-values, upper limits, null hypothesis, test statistics, confidence intervals with peculiar nuisance parameters or rather lack of straightforwardness priors, which lead to lengthy discussions among scientists and produce various research papers. In contrast, the challenges that astronomers have are not just finding the existence of new particles but going beyond or juxtaposing. Astronomers like to parameterize them by selecting suitable source models, from which collected photons are the results of modification caused by their journey and obstacles in their path. Such parameterization allows them to explain the driving sources of photon emission/absorption. It enables to predict other important features, temperature to luminosity, magnitudes to metalicity, and many rules of conversions.

Due to different objectives, one is finding a hay look alike needle in a haystack and the other is defining photon generating mechanisms (it may lead to find a new kind celestial object), this article may not interest astronomers. Yet, having the common ground, physics and statistics, it is a dash of enlightenment of knowing various statistical methods applied to physical data analysis for achieving a goal, refining physics. I recall my posts on coverages and references therein might be helpful:interval estimation in exponential families and [arxiv] classical confidence interval.

I felt that from papers some astronomers do not aware of problems with χ2 minimization nor the underline assumptions about the method. This paper convey some dangers about the χ2 with the real examples from physics, more convincing for astronomers than statisticians’ hypothetical examples via controlled Monte Carlo simulations.

And there are more reasons to check this paper out!

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-particle-physics/feed/ 0
[ArXiv] Spectroscopic Survey, June 29, 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-spectroscopic-survey-june-29-2007/ http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-spectroscopic-survey-june-29-2007/#comments Mon, 02 Jul 2007 22:07:39 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-spectroscopic-survey-june-29-2007/ From arXiv/astro-ph:0706.4484

Spectroscopic Surveys: Present by Yip. C. overviews recent spectroscopic sky surveys and spectral analysis techniques toward Virtual Observatories (VO). In addition that spectroscopic redshift measures increase like Moore’s law, the surveys tend to go deeper and aim completeness. Mainly elliptical galaxy formation has been studied due to more abundance compared to spirals and the galactic bimodality in color-color or color-magnitude diagrams is the result of the gas-rich mergers by blue mergers forming the red sequence. Principal component analysis has incorporated ratios of emission line-strengths for classifying Type-II AGN and star forming galaxies. Lyα identifies high z quasars and other spectral patterns over z reveal the history of the early universe and the characteristics of quasars. Also, the recent discovery of 10 satellites to the Milky Way is mentioned.

Spectral analyses take two approaches: one is the model based approach taking theoretical templates, known for its flaws but straightforward extractions of physical parameters, and the other is the empirical approach, useful for making discoveries but difficult in the analysis interpretation. Neither of them has substantial advantage to the other. When it comes to fitting, Chi-square minimization has been dominant but new methodologies are under developing. For spectral classification problems, principal component analysis (Karlhunen-Loeve transformation), artificial neural network, and other machine learning techniques have been applied.

In the end, the author reports statistical and astrophysical challenges in massive spectroscopic data of present days: 1. modeling galaxies, 2. parameterizing star formation history, 3. modeling quasars, 4. multi-catalog based calibration (separating systematic and statistics errors), 5. estimating parameters, which would be beneficial to VO, of which objective is the unification of data access.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-spectroscopic-survey-june-29-2007/feed/ 0