#### Significance of 5 counts

We have talked about it many times. Now I have to work with the reality. My source shows only 5 counts in a short 5 ksec Chandra exposure. Is this a detection of the source? or is this a random fluctuation? Chandra background is low and data are intrinsically Poisson, so the problem should be easy to solve. Not really! There is no tool to calculate this well, no actually it is! Tom A. and I found it by searching Google “Python gamma function” and came out with Tom Loredo’s Python functions (sp_funcs.py) that he translated from Numerical Recipes to Python. This is the working tool! We just needed to change “import Numeric” or “import Numarray” to “import numpy as N” and then it worked.

We calculated the significance of observing 5 counts given the expected background counts of 0.1 using spfunc.gammp(5,0.1) =8e-8. The detection is highly significant.

Any comments?

## hlee:

Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,…} P(X>=5|B,BC)P(BC). I wonder if the spfunc.gamma computes this (sorry for being phthon illiterate). I believe more fine tuning in computing this p-value is necessary but I wish to address that computing the p-value assuming Ho: background counts=0.1 vs. H1: background counts\ne 0.1 makes me uncomfortable when the observation is 5 counts, whereas the background counts are integers with very low expected background counts. I presume in either ways, 5 counts is opt to be said significant.

04-17-2008, 11:39 pm## vlk:

Ah, but Aneta, what about the variation in the background? How do you specify BACKSCAL? If you determine background by collecting 1 count in bkgarea=10*srcarea, 5 observed counts is only about 4sigma. Significant, but not that highly! (Rises to 5sigma for 10 counts in 100*srcarea, etc.)

PS: I used a PINTofALE IDL function called detect_limit() ( http://hea-www.harvard.edu/PINTofALE/doc/PoA.html#DETECT_LIMIT ) to get those numbers.

04-17-2008, 11:43 pm## aneta:

o.k. I simulated background counts given the observed 6 background counts in the background area and calculated the predicted number of counts in the source area. I did 1000 realizations assuming Poisson distribution for the background. Then I calculated the significance of 5 detected counts given each of the simulated background realizations – so I got 1000 p-values. What should I do with the p-values now? After talking to Vinay we decided that the thing to do is to calculate the mean of p-values. The mean results in a little higher than the original significance. I get 6e-7 instead of 8e-8 for the significance of 5 counts detected in my observations.

I run the simulations in Python with numpy.random.poisson() to get the simulated background counts and use the incomplete gamma function to calculate the significance for each simulation. I also plotted the distribution of my p-values…

Another trick here in addition to the background fluctuations is to consider also the prior knowledge about the source being at the location of the detected photons. How do we include

04-18-2008, 9:33 pmthis prior information in the calculated p-values?

## vlk:

On why take the mean of the p-values. For a given rescaled background (b/50), the p-value is the probability that the counts in the source region can exceed the observed counts. So if you imagine drawing N samples of the background _in the source region_, then a fraction p of them will have counts greater than 5. When you simulate different rescaled backgrounds by drawing from the Poisson distribution p(b|B=6) say NSIM times, each simulation will give a different fraction p_i of counts greater than 5. So, for an overall p-value, you want the number of times that you will see counts in the source region greater than 5, which is simply Sum(p_i*N) in NSIM simulations, which is simply the mean of {p_i*N}, and when expressed as a fraction by dividing out N, is the mean of the p_i.

04-22-2008, 6:20 pm## TomLoredo:

Hyunsook wrote:

Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,…} P(X>=5|B,BC)P(BC).This is along the lines of what a Bayesian calculation does. You can do it either with a known background rate, or accounting for uncertainty; either way, the marginal likelihood for the signal strength ends up being a sum over terms that condition on a certain number of b.g. counts, appropriately weighted. (This is one of the examples I present in my CASt summer school lectures.) I think the “bayes” option for the fitting statistic in Sherpa implements this (based on code I provided Peter Freeman), but I don’t know if it’s separately exposed for aiding source significance calculations.

There is some frequentist work with somewhat of the same flavor (conditioning on the b.g. counts being below the observed total # of counts); Michael Woodroofe did something along these lines.

04-24-2008, 6:15 pm