The AstroStat Slog » background

Poisson vs Gaussian, Part 2

vlk — Fri, 10 Apr 2009 19:16:31 +0000

Probability density functions are another way of summarizing the consequences of assuming a Gaussian error distribution when the true distribution is Poisson. We can compute the posterior probability of the intensity of a source, when some number of counts are observed in a source region, and the background is estimated using counts observed in a different region. We can then compare it to the equivalent Gaussian.

The figure below (AAS 472.09) compares the pdfs for the Poisson intensity (red curves) and the Gaussian equivalent (black curves) for two cases: when the number of counts in the source region is 50 (top) and 8 (bottom) respectively. In both cases a background of 200 counts collected in an area 40x the source area is used. The hatched region represents the 68% equal-tailed interval for the Poisson case, and the solid horizontal line is the ±1σ width of the equivalent Gaussian.

Clearly, for small counts, the support of the Poisson distribution is bounded below at zero, but that of the Gaussian is not. This introduces a visibly large bias in the interval coverage as well as in the normalization properties. Even at high counts, the Poisson is skewed such that larger values are slightly more likely to occur by chance than in the Gaussian case. This skew can be quite critical for marginal results.

Poisson and Gaussian probability densities

No simple IDL code this time; but for reference, the Poisson posterior probability density curves were generated with the PINTofALE routine ppd_src()

Background Subtraction, the Sequel [Eqn]

vlk — Wed, 06 Aug 2008 17:00:39 +0000

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data.

For instance, suppose as before, that C counts are observed in a region of the image that overlaps a putative source, and B counts in an adjacent, non-overlapping region that is mostly devoid of the source and which is r times larger in area and exposure than the source region. Further suppose that a fraction f of the source falls in the so-called source region (typically, f~0.9) and a fraction g falls in the background region (we strive to make g~0). Then the observed counts can be written as Poisson realizations of intensities,

C = Poisson(φ_S) ≡ Poisson(f θ_S + θ_B) , and
B = Poisson(φ_B) ≡ Poisson(g θ_S + r θ_B) ,

where the subscripts denote the model intensities in the source (S) or background (B) regions.

The joint probability distribution of the model intensities,

p(φ_S φ_B | C B) dφ_S dφ_B

can be rewritten in terms of the interesting parameters by transforming the variables,

≡ p(θ_S θ_B | C B) J(φ_S φ_B ; θ_S θ_B) d θ_S d θ_B

where J(φ_S φ_B ; θ_S θ_B) is the Jacobian of the coordinate transformation, and thus

= p(θ_S θ_B | C B) (r f – g) d θ_S d θ_B .

The posterior probability distribution of the source intensity, θ_S, can be derived by marginalizing this over the background intensity parameter, θ_B. A number of people have done this calculation in the case f=1,g=0 (e.g., Loredo 1992, SCMA II, p275; see also van Dyk et al. 2001, ApJ 584, 224). The general case is slightly more convoluted, but is still a straightforward calculation (Kashyap et al. 2008, AAS-HEAD 9, 03.02); but more on that another time.

Background Subtraction [EotW]

vlk — Wed, 21 May 2008 17:00:32 +0000

There is a lesson that statisticians, especially of the Bayesian persuasion, have been hammering into our skulls for ages: do not subtract background. Nevertheless, old habits die hard, and old codes die harder. Such is the case with X-ray aperture photometry.

When C counts are observed in a region of the image that overlaps a putative source, and B counts in an adjacent, non-overlapping region that is mostly devoid of the source, the question that is asked is, what is the intensity of a source that might exist in the source region, given that there is also background. Let us say that the source has intensity s, and the background has intensity b in the first region. Further let a fraction f of the source overlap that region, and a fraction g overlap the adjacent, “background” region. Then, if the area of the background region is r times larger, we can solve for s and b and even determine the errors:

Note that the regions do not have to be circular, nor does the source have to be centered in it. As long as the PSF fractions f and g can be calculated, these formulae can be applied. In practice, f is large, typically around 0.9, and the background region is chosen as an annulus centered on the source region, with g~0.

It always comes as a shock to statisticians, but this is not ancient history. We still determine maximum likelihood estimates of source intensities by subtracting out an estimated background and propagate error by the method of moments. To be sure, astronomers are well aware that these formulae are valid only in the high counts regime ( s,C,B>>1, b>0 ) and when the source is well defined ( f~1, g~0 ), though of course it doesn’t stop them from pushing the envelope. This, in fact, is the basis of many standard X-ray source detection algorithms (e.g., celldetect).

Furthermore, it might come as a surprise to many astronomers, but this is also the rationale behind the widely-used wavelet-based source detection algorithm, wavdetect. The Mexican Hat wavelet used with it has a central positive bump, surrounded by a negative annular moat, which is a dead ringer for the source and background regions used here. The difference is that the source intensity is not deduced from the wavelet correlations and the signal-to-noise ratio ( s/sigma_s ) is not used to determine source significance, but rather extensive simulations are used to calibrate it.

[ArXiv] 3rd week, Apr. 2008

hlee — Mon, 21 Apr 2008 01:05:55 +0000

The dichotomy of outliers; detecting outliers to be discarded or to be investigated; statistics that is robust enough not to be influenced by outliers or sensitive enough to alert the anomaly in the data distribution. Although not related, one paper about outliers made me to dwell on what outliers are. This week topics are diverse.

[astro-ph:0804.1809] H. Khiabanian, I.P. Dell’Antonio
A Multi-Resolution Weak Lensing Mass Reconstruction Method (Maximum likelihood approach; my naive eyes sensed a certain degree of relationship to the GREAT08 CHALLENGE)
[astro-ph:0804.1909] A. Leccardi and S. Molendi
Radial temperature profiles for a large sample of galaxy clusters observed with XMM-Newton
[astro-ph:0804.1964] C. Young & P. Gallagher
Multiscale Edge Detection in the Corona
[astro-ph:0804.2387] C. Destri, H. J. de Vega, N. G. Sanchez
The CMB Quadrupole depression produced by early fast-roll inflation: MCMC analysis of WMAP and SDSS data
[astro-ph:0804.2437] P. Bielewicz, A. Riazuelo
The study of topology of the universe using multipole vectors
[astro-ph:0804.2494] S. Bhattacharya, A. Kosowsky
Systematic Errors in Sunyaev-Zeldovich Surveys of Galaxy Cluster Velocities
[astro-ph:0804.2631] M. J. Mortonson, W. Hu
Reionization constraints from five-year WMAP data
[astro-ph:0804.2645] R. Stompor et al.
Maximum Likelihood algorithm for parametric component separation in CMB experiments (separate section for calibration errors)
[astro-ph:0804.2671] Peeples, Pogge, and Stanek
Outliers from the Mass–Metallicity Relation I: A Sample of Metal-Rich Dwarf Galaxies from SDSS
[astro-ph:0804.2716] H. Moradi, P.S. Cally
Time-Distance Modelling In A Simulated Sunspot Atmosphere (discusses systematic uncertainty)
[astro-ph:0804.2761] S. Iguchi, T. Okuda
The FFX Correlator
[astro-ph:0804.2742] M Bazarghan
Automated Classification of ELODIE Stellar Spectral Library Using Probabilistic Artificial Neural Networks
[astro-ph:0804.2827]S.H. Suyu et al.
Dissecting the Gravitational Lens B1608+656: Lens Potential Reconstruction (Bayesian)