The AstroStat Slog » binomial

Poisson Likelihood [Equation of the Week]

vlk — Wed, 02 Jul 2008 17:00:32 +0000

Astrophysics, especially high-energy astrophysics, is all about counting photons. And this, it is said, naturally leads to all our data being generated by a Poisson process. True enough, but most astronomers don’t know exactly how it works out, so this derivation is for them.

Suppose N counts are randomly placed in an interval of duration τ without any preference for appearing in any particular portion of τ. i.e., the distribution is uniform. The counting rate R = N/τ. We can now ask, what is the probability of finding k counts in an infinitesimal interval δt within τ?

First, consider the probability that one count, placed randomly, will fall inside δt,

ρ = δt/τ ≡ Rδt/N ≡ ν/N

where ν = R δt represents the expected counts intensity in the interval δt. When N counts are scattered over τ, the probability that k of them will fall inside δt is described with a binomial distribution,

p(k|ρ,N) = ^NC_k ρ^k (1-ρ)^N-k

as the product of the probability of finding k events inside δt and the probability of finding the remaining events outside, summed over all the possible distinct ways that k events can be chosen out of N. Expanding the expression and rearranging,

= N!/{(N-k)!k!} (R δt/N)^k (1-(R δt/N))^N-k

= N!/{(N-k)!k!} (ν^k/N^k) (1-(ν/N))^N-k

= N!/{(N-k)!N^k} (ν^k/k!) (1-(ν/N))^N (1-(ν/N))^-k

Note that as N,τ —> ∞ (while keeping R fixed),

N!/{(N-k)!N^k} , (1-(ν/N))^-k —> 1
(1-(ν/N))^N —> e^-ν

and the expression reduces to

p(k|ν) = (ν^k/k!) e^-ν

which is the familiar (in a manner of speaking) expression for the Poisson likelihood.

[ArXiv] Swift and XMM measurement errors, Sep. 8, 2007

hlee — Tue, 11 Sep 2007 05:12:41 +0000

From arxiv/astro-ph:0708.1208v1:
The measurement errors in the Swift-UVOT and XMM-OM by N.P.M. Kuin and S.R. Rosen

The probability distribution of photon counts from the Optical Monitor on XMM Newton satellite (XMM-OM) and the UVOT on the Swift satellite follows a binomial distribution due to detector characteristics. Incident count rate was derived as a function of the measured count rate, which was shown to follow a binomial distribution.

Discrepancy between the mapping the 1σ Poisson incident count error onto measured count error and the mapping the 1σ binomial measurement count error onto incident counts was illustrated with an example of large counts per frame. Although this discrepancy for small counts per frame was commented to be negligible, the authors urge to use their binomial distribution based formalism to derive the errors in measurements.

Coverage issues in exponential families

hlee — Thu, 16 Aug 2007 20:36:51 +0000

I’ve been heard so much, without knowing fundamental reasons (most likely physics), about coverage problems from astrostat/phystat groups. This paper might be an interest for those: Interval Estimation in Exponential Families by Brown, Cai,and DasGupta ; Statistica Sinica (2003), 13, pp. 19-49

Abstract summary:
The authors investigated issues in interval estimation of the mean in the exponential family, such as binomial, Poisson, negative binomial, normal, gamma, and a sixth distribution. The poor performance of the Wald interval has been known not only for discrete cases but for nonnormal continuous cases with significant negative bias. Their computation suggested that the equal tailed Jeffreys interval and the likelihood ratio interval are the best alternatives to the Wald interval.

Brief summary of the paper without equations:
The objective of this paper is interval estimation of the mean in the natural exponential family (NEF) with quadratic variance functions (QVF) and the particular focus has given to discrete NEF-QVF families consisting of the binomial, negative binomial, and the Poission distributions. It is well known that the Wald interval for a binomial proportion suffers from a systematic negative bias and oscillation in its coverage probability even for large n and p near 0.5, which seems to arise from the lattice nature and the skewness of the binomial distribution. They exemplified this systematic bias and oscillation with Poisson cases to illustrate the poor and erratic behavior of the Wald interval in lattice problems. They proved the bias expressions of the three discrete NEF-QVF distributions and added a disconcerting graphical illustration of this negative bias.

Interested readers should check the figure 4, where the performances of the Wald, score, likelihood ratio (LR), and Jeffreys intervals were compared. Also, the figure 5 illustrated the limits of those four intervals: LR and Jeffreys’ intervals were indistinguishable. They derived the coverage probabilities of four intervals via Edgeworth expansions. The nonoscillating O(n^-1) terms from the Edgeworth expansions were studied to compare the coverage properties of these four intervals. The figure 6 shows that the Wald interval has serious negative bias, whereas the nonoscillating term in the score interval is positive for all three, binomial, negative binomial, and Poission distributions. The negative bias of the Wald interval is also found from continuous distributions like normal, gamma, and NEF-GHS distributions (Figure 7).

As a conclusion, they reconfirmed their findings like LR and Jeffreys intervals are the best alternative to the Wald interval in terms of the negative bias in the coverage and the length. The Rao score interval has a merit of easy presentations but its performance is inferior to LR and Jeffreys’ intervals although it is better than the Wald interval. Yet, the authors left a room for users that choosing one of these intervals is a personal choice.

[Addendum] I wonder if statistical properties of Gehrels’ confidence limits have been studied after the publication. I’ll try to post findings about the statistics of the Gehrels’ confidence limits, shortly(hopefully).