The AstroStat Slog » PDF

Poisson vs Gaussian, Part 2

vlk — Fri, 10 Apr 2009 19:16:31 +0000

Probability density functions are another way of summarizing the consequences of assuming a Gaussian error distribution when the true distribution is Poisson. We can compute the posterior probability of the intensity of a source, when some number of counts are observed in a source region, and the background is estimated using counts observed in a different region. We can then compare it to the equivalent Gaussian.

The figure below (AAS 472.09) compares the pdfs for the Poisson intensity (red curves) and the Gaussian equivalent (black curves) for two cases: when the number of counts in the source region is 50 (top) and 8 (bottom) respectively. In both cases a background of 200 counts collected in an area 40x the source area is used. The hatched region represents the 68% equal-tailed interval for the Poisson case, and the solid horizontal line is the ±1σ width of the equivalent Gaussian.

Clearly, for small counts, the support of the Poisson distribution is bounded below at zero, but that of the Gaussian is not. This introduces a visibly large bias in the interval coverage as well as in the normalization properties. Even at high counts, the Poisson is skewed such that larger values are slightly more likely to occur by chance than in the Gaussian case. This skew can be quite critical for marginal results.

Poisson and Gaussian probability densities

No simple IDL code this time; but for reference, the Poisson posterior probability density curves were generated with the PINTofALE routine ppd_src()

my first AAS. VI. Normalization

hlee — Sat, 21 Jun 2008 03:58:34 +0000

One realization of mine during the meeting was related to a cultural difference; therefore, there is no relation to any presentations during the 212th AAS in this post. Please, correct me if you find wrong statements. I cannot cover all perspectives from both disciplines but I think there are two distinct fashions in practicing normalization.

$$\frac{1}{N(\theta)}\int_{\Omega} f(x;\theta)dx=1$$
$$\frac{1}{N(x)}\int_{\Theta} f(x;\theta)d\theta=1$$
If you are Beyesian, Θ is the focus; otherwise, Ω. Regardless, finding N that satisfies the above relations is called normalization. And the difference between astronomers and statisticians is how Θ or Ω is treated.

For astronomers, in general, the integration occurs in the range of observed minimum (or zero, depending on what physics tells you) and observed maximum. Statisticians, generally, integrate over the whole parameter space that satisfies the measure theory; for example, if f(x;θ) is in the form of gaussian distribution, then (-∞, ∞).

This different trend occurs because of the different view points toward f(x;θ)/N (hereafter, f). For statisticians, the integration occurs over a properly defined measure space and f is a proper density function. For astronomers, the integration happens over a physically meaningful space and f is a viable model subject to the law of physics.

However, I want to warn astronomers transforming f defined on a physically meaningful space for statistical inference. Sometimes, statistical inferences is performed on f while this model f is the result of fitting, not a probability density function because astronomer’s f was not defined based on measure theory. It is clear that the variance from the truncated normal distribution is different from the one from the regular normal distribution. This will give different confidence interval at a given confidence level and the size of the interval, astronomers care so much about will vary.

The astronomers’ f is not necessarily to be pmf or pdf unless f is defined on a proper probability measure space. Without checking whether the normalized f is measurable, often times the 2nd derivative of f is derived for a fisher information or a covariance matrix from which error bars with a given confidence level are built. Due to the fact that astronomers’ f may not satisfy the basic probability axioms, the nominal coverage that one likes to compare with other results can be underestimated (in my opinion, it is the primary reason for the claim of improvement in the results in astronomical papers thanks to narrow intervals; however, one cannot claim such victory because underlying assumptions are inconsistent).

Since astronomers are so keen on error bars and coverage, I wish them to care fundamentals of probability theory on which statistical inference is built in their normalization process.

A disclaimer of mine is that I often see astronomers are well aware of the properties of pdfs in general. Narrow error bars from other types of statistical analysis are most likely legitimate.

— This is my last posting on my first AAS

[ArXiv] 4th week, Nov. 2007

hlee — Sat, 24 Nov 2007 13:26:40 +0000

A piece of thought during my stay in Korea: As not many statisticians are interested in modern astronomy while they look for data driven problems, not many astronomers are learning up to date statistics while they borrow statistics in their data analysis. The frequency is quite low in astronomers citing statistical journals as little as statisticians introducing astronomical data driven problems. I wonder how other fields lowered such barriers decades ago.

No matter what, there are preprints from this week that may help to shrink the chasm.

[stat.ME:0711.3236]
Confidence intervals in regression utilizing prior information P. Kabaila and K. Giri
[stat.ME:0711.3271]
Computer model validation with functional output M. J. Bayarri, et. al.
[astro-ph:0711.3266]
Umbral Fine Structures in Sunspots Observed with Hinode Solar Optical Telescope R. Kitai, et.al.
[astro-ph:0711.2720]
Magnification Probability Distribution Functions of Standard Candles in a Clumpy Universe C. Yoo et.al.
[astro-ph:0711.3196]
Upper Limits from HESS AGN Observations in 2005-2007 HESS Collaboration: F. Aharonian, et al
[astro-ph:0711.2509]
Shrinkage Estimation of the Power Spectrum Covariance Matrix A. C. Pope and I. Szapudi
[astro-ph:0711.2631]
Statistical properties of extragalactic sources in the New Extragalactic WMAP Point Source (NEWPS) catalogue J. González-Nuevo, et. al.

[ArXiv] 1st week, Nov. 2007

hlee — Fri, 02 Nov 2007 21:59:08 +0000

To be exact, the title of this posting should contain 5th week, Oct, which seems to be the week of EGRET. In addition to astro-ph papers, although they are not directly related to astrostatistics, I include a few statistics papers which may be profitable for astronomical data analysis.

[astro-ph:0710.4966]
Uncertainties of the antiproton flux from Dark Matter annihilation in comparison to the EGRET excess of diffuse gamma rays by Iris Gebauer
[astro-ph:0710.5106]
The dark connection between the Canis Major dwarf, the Monoceros ring, the gas flaring, the rotation curve and the EGRET excess of diffuse Galactic Gamma Rays by W. de Boer et.al.
[astro-ph:0710.5119]
Determination of the Dark Matter profile from the EGRET excess of diffuse Galactic gamma radiation by Markus Weber
[astro-ph:0710.5171]
Systematic Bias in Cosmic Shear: Beyond the Fisher Matrix by A.Amara and A. Refregier
[astro-ph:0710.5560]
Principal Component Analysis of the Time- and Position-Dependent Point Spread Function of the Advanced Camera for Surveys by M.J. Jee et.al.
[astro-ph:0710.5637]
A method of open cluster membership determination by G. Javakhishvili et.al.
[stat.CO:0710.5670]
An Elegant Method for Generating Multivariate Poisson Data by I. Yahav and G.Shmueli
[astro-ph:0710.5788]
Variations in Stellar Clustering with Environment: Dispersed Star Formation and the Origin of Faint Fuzzies by B. G. Elmegreen
[math.ST:0710.5749]
On the Laplace transform of some quadratic forms and the exact distribution of the sample variance from a gamma or uniform parent distribution by T.Royen
[math.ST:0710.5797]
The Distribution of Maxima of Approximately Gaussian Random Fields by Y. Nardi, D.Siegmund and B.Yakir
[astro-ph:0711.0177]
Maximum Likelihood Method for Cross Correlations with Astrophysical Sources by R.Jansson and G. R. Farrar
[stat.ME:0711.0198]
A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap by U. von Luxburg and V. H. Franz