Archive for the ‘Stat’ Category.

Poisson vs Gaussian, Part 2

Probability density functions are another way of summarizing the consequences of assuming a Gaussian error distribution when the true distribution is Poisson. We can compute the posterior probability of the intensity of a source, when some number of counts are observed in a source region, and the background is estimated using counts observed in a different region. We can then compare it to the equivalent Gaussian.

The figure below (AAS 472.09) compares the pdfs for the Poisson intensity (red curves) and the Gaussian equivalent (black curves) for two cases: when the number of counts in the source region is 50 (top) and 8 (bottom) respectively. In both cases a background of 200 counts collected in an area 40x the source area is used. The hatched region represents the 68% equal-tailed interval for the Poisson case, and the solid horizontal line is the ±1σ width of the equivalent Gaussian.

Clearly, for small counts, the support of the Poisson distribution is bounded below at zero, but that of the Gaussian is not. This introduces a visibly large bias in the interval coverage as well as in the normalization properties. Even at high counts, the Poisson is skewed such that larger values are slightly more likely to occur by chance than in the Gaussian case. This skew can be quite critical for marginal results. Continue reading ‘Poisson vs Gaussian, Part 2’ »

Poisson vs Gaussian

We astronomers are rather fond of approximating our counting statistics with Gaussian error distributions, and a lot of ink has been spilled justifying and/or denigrating this habit. But just how bad is the approximation anyway?

I ran a simple Monte Carlo based test to compute the expected bias between a Poisson sample and the “equivalent” Gaussian sample. The result is shown in the plot below.

The jagged red line is the fractional expected bias relative to the true intensity. The typical recommendation in high-energy astronomy is to bin up events until there are about 25 or so counts per bin. This leads to an average bias of about 2% in the estimate of the true intensity. The bias drops below 1% for counts >50. Continue reading ‘Poisson vs Gaussian’ »

[MADS] Chernoff face

I cannot remember when I first met Chernoff face but it hooked me up instantly. I always hoped for confronting multivariate data from astronomy applicable to this charming EDA method. Then, somewhat such eager faded, without realizing what’s happening. Tragically, this was mainly due to my absent mind. Continue reading ‘[MADS] Chernoff face’ »

Use and Misuse of Chi-square

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

1. Lack of independence among the single events or measures
2. Small theoretical frequencies
3. Neglect of frequencies of non-occurrence
4. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)
5. Indeterminate theoretical frequencies
6. Incorrect or questionable categorizing
7. Use of non-frequency data
8. Incorrect determination of the number of degrees of freedom
9. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From “Chapter 10: On the Use and Misuse of Chi-square” by K.L.Delucchi in A Handbook for Data Analysis in the Behavioral Sciences (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled “The Use and Misuse of the Chi-square” published in Psychological Bulletin. Continue reading ‘Use and Misuse of Chi-square’ »

[Announce] Heidelberg Summer School

From Christian Fendt comes this announcement:

——————————————————————
First Announcement and Call for Applications
——————————————————————

The “International Max Planck Research School for Astronomy & Cosmic Physics at the University of Heidelberg” (IMPRS-HD)

announces the

— 4th Heidelberg Summer School:

— Statistical Inferences from Astrophysical Data

— August 10-14, 2009

Continue reading ‘[Announce] Heidelberg Summer School’ »

[Announce] AstroStat Summer School at Penn State

From Jogesh Babu comes this announcement:

Summer School in Statistics for Astronomers V
June 1-6, 2009
Penn State University
http://astrostatistics.psu.edu/su09/

Continue reading ‘[Announce] AstroStat Summer School at Penn State’ »

Web Seminar

I was disappointed when video, audio, or handout files were not available from the research program “Statistical Theory and Methods for Complex High-Dimensional Data” held at Isaac Newton Institute for Mathematical Sciences during the first half of last year after checking the sites several times. Wow…They are now there~ Continue reading ‘Web Seminar’ »

4754 d.f.

I couldn’t believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I’ve often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom. Continue reading ‘4754 d.f.’ »

[MADS] Mahalanobis distance

It bears the name of its inventor, Prasanta Chandra Mahalanobis. As opposed to the Euclidean distance, a household name, the name of this distance is rarely used but many pseudonyms exist with variations adapted into broad scientific disciplines and applications. Therefore, under different names, I believe that the Mahalanobis distance is frequently applied in exploring and analyzing astronomical data. Continue reading ‘[MADS] Mahalanobis distance’ »

systematic errors

Ah ha~ Once I questioned, “what is systematic error?” (see [Q] systematic error.) Thanks to L. Lyons’ work discussed in [ArXiv] Particle Physics, I found this paper, titled Systematic Errors describing the concept and statistical inference related to systematic errors in the field of particle physics. It, gladly, shares lots of similarity with high energy astrophysics. Continue reading ‘systematic errors’ »

[ArXiv] Particle Physics

[stat.AP:0811.1663]
Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription. Continue reading ‘[ArXiv] Particle Physics’ »

Guinness, Gosset, Fisher, and Small Samples

Student’s t-distribution is somewhat underrepresented in the astronomical community. Having an article with nice stories, it looks to me the best way to introduce the t distribution. This article describing historic anecdotes about monumental statistical developments occurred about 100 years ago.

Guinness, Gosset, Fisher, and Small Samples by Joan Fisher Box
Source: Statist. Sci. Volume 2, Number 1 (1987), 45-52.

No time for reading the whole article? I hope you have a few minutes to read following quotes, which are quite enchanting to me. Continue reading ‘Guinness, Gosset, Fisher, and Small Samples’ »

A book by David Freedman

A continuation from my posting, titled circumspect frequentist.

Title: Statistical Models: Theory and Practice (click for the publisher’s website)
My one line review, rather a comment several months ago was

Bias in asymptotic standard errors is not a familiar topic for astronomers

and I don’t understand why I wrote it but I think I came up this comment owing to my pursuit of modeling measurement errors occurring in astronomical researches. Continue reading ‘A book by David Freedman’ »

[MADS] Semiparametric

There were (only) four articles from ADS whose abstracts contain the word semiparametric (none in titles). Therefore, semiparametric is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term semiparametric to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »

[ArXiv] Special Issue from Annals of Applied Statistics

When I was studying astronomy, during when I once became a subject for a social science survey study about life in a department where gender bias is extreme (I was only female), people often asked me how to forecast weather or how to predict future (boys often get questions related to becoming astronauts in addition to weather men and astrologists). Relating astronomy to earth science still happens. Statisticians that I met at conferences, often tried to associate my efforts on astronomical data with those of geologists and meteorologists, who often use stochastic models and spatial temporal models, dimensional extensions of models in time series. Because of this confusion between astronomy and meteorology/geology/oceanology, and the longer history of wide statistical applications found from the latter subjects (a good counter example is the least square method by Gauss but I cannot think more examples to contradict my statement that statistics is used widely among earth scientists with rich history), from time to time my attention has been paid to various applications and models in those subjects so as to find a thread for similar applications for astronomy. Although I don’t like the misconception of astronomy equal to meteorology or geoscience, those scientific fields, what so ever, share at least one commonality that statistical methods are applied to analyzing satellite data. Continue reading ‘[ArXiv] Special Issue from Annals of Applied Statistics’ »