When you observed zero counts, you didn’t not observe any counts

Dong-Woo, who has been playing with BEHR, noticed that the confidence bounds quoted on the source intensities seem to be unchanged when the source counts are zero, regardless of what the background counts are set to. That is, p(s|NS,NB) is invariant when NS=0, for any value of NB. This seems a bit odd, because [naively] one expects that as NB increases, it should/ought to get more and more likely that s gets closer to 0.

Suppose you compute the posterior probability distribution of the intensity of a source, s, when the data include counts in a source region (NS) and counts in a background region (NB). When NS=0, i.e., no counts are observed in the source region,

p(s|NS=0, NB) = (1+b)a/Gamma(a) * sa-1 * e-s*(1+b),

where a,b are the parameters of a gamma prior.

Why does NB have no effect? Because when you have zero counts, the entire effect of the background is going towards evaluating how good the actual chosen model is (so it is become a model comparison problem, not a parameter estimation one), and not into estimating the parameter of interest, the source intensity. That is, into the normalization factor of the probability distribution, p(NS,NB). Those parts that depend on NB cancel out when the expression for p(s|NS,NB) is written out because the shape is independent of NB and the pdf must integrate to 1.

No doubt this is obvious, but I hadn’t noticed it before.

PS: Also shows why upper limits should not be identified with upper confidence bounds.

7 Comments
  1. hlee:

    Without knowing the physics of choosing gamma distribution, my question may look nonsensical but I wonder if it’s possible to build a mixture model, r*p(s|N_s=0,N_b)+(1-r)*p(s|N_s>0,N_b). The first component comes from other type of distribution, not gamma. I believe astronomers know how to infer r. If it’s a single case and not sure N_s=0 or N_s>0, you can select between two (r=1 or 0) based on model selection criteria or bayes factor. Then you could find proper upper limits based on post model selection inference.

    [Response: There is no confusion with the value of N_S -- these are the observed data. So I can't say where that mixture model will apply.
    -vlk]

    09-26-2007, 11:01 am
  2. hlee:

    Does the response imply the posterior distribution is improper? Does it mean the posterior distribution is misspecified? Then, instead of mixture, looking for other distributions for a confidence bound. The point of mixture is to keep the current posterior when N_s>0 but to add an additional component for N_s=0 to avoid the behavior you’ve described. r could be an indicator, I(N_s=0). I admit that I didn’t know N_s is deterministic. I thought you only observe N, hardly separable in two, N_s and N_b, in a deterministic way.

    [Response: No, the posterior is not improper (as long as a>0). Nor is it misspecified in any sense that I know of. You cannot mix posterior distributions obtained from different data. N_S are the counts in the source region, N_B in the "off-source", aka background, region. When you observe 0 counts in the source region, that is what you observe, there is no other N_S to mix it with.
    -vlk]

    09-26-2007, 6:12 pm
  3. hlee:

    Well, then your posterior is built on an open set. No reason for considering the boundary.

    [Response: Boundary of what?
    To clarify, s is the source intensity, and is the parameter of interest; N_S and N_B are the observed data; p(sb|N_S,N_B) is the joint posterior for s and the background intensity b; and p(s|N_S,N_B) is the background marginalized posterior probability distribution for s. See Section 2.1 of van Dyk et al. (2001, ApJ 548, 224) -- s is the same as lambda^S, etc. in their Eqn 5, 6.
    -vlk]

    09-27-2007, 11:54 am
  4. hlee:

    If θ is the parameter of your interest, {θ:θ>0} is an open set, which specifies the parameter space. Openness and closeness also specifies the data space. When I read your posting, my impression was at N_s=0 (the boundary), the posterior behaves in an unexpected way, which led me to suggest a mixture model to resolve the strange behavior. My specifying distribution properly implies that the distribution behave properly in the data/parameter space, including N_s=0. If the boundary N_s=0 causing troubles, then keep the current posterior for N_s>0 and add another component from different family for N_s=0 (the boundary). [Checking the validity of this mixture model is another topic.]
    I’d rather point a well cited paper, to indicate what I meant by misspecified.
    Maximum Likelihood Estimation of Misspecified Models by H. White (1982) in Econometrica. References therein are quite classical.
    Mixture models and mixing are different topics, I guess. I only know a bit of mixture models. I hope mixing didn’t occur to you.

    [Response: I've already said s is the parameter of interest. There is nothing that says s cannot be 0. There is an integrable posterior probability distribution p(s|N_S,N_B) on s over [0,+\infty]. N_S are data. When N_S=0, that is your measurement, which is one number, and is fixed, unchanging, immutable, for that observation. I suspect that you may be confusing a Bayesian calculation with frequentist ideas.
    -vlk]

    09-27-2007, 7:05 pm
  5. Paul B:

    What is the actual model you are both talking about? Without knowing the details, it sounds like a similar situation to the Banff model, discussed on pg15 of:
    http://newton.hep.upenn.edu/~heinrich/birs/challenge.pdf
    Is that of interest?

    [Response: It is a subset of the Banff problem, one that does not include the 3rd equation. i.e.,
    N_S ~ Pois(s+b)
    N_B ~ Pois(t*b)
    See Eqns 5 and 6 of van Dyk et al. 2001 (linked in above).

    It is the standard "background subtraction" problem in [X-ray] astronomy, where you need to infer the intensity of a source from two measurements: one of background only, and one of source+background. In the high counts limit,
    E(s) = N_S – (N_b/t) .
    -vlk]

    09-29-2007, 8:29 pm
  6. vlk:

    Paul, yes, the behavior described in section 8.3 in that BIRS writeup is exactly what I wrote about above. There are two items in the Heinrich draft that I do not quite understand though — first, why the parenthetical insistence on 0 background events? and second, what does he mean by “absolute separation” between source and background?

    09-29-2007, 10:30 pm
  7. hlee:

    I’m ready to take any blames and admit my ignorance (I only can propose approaches limited to my knowledge and readings from the slog). I wonder if there’s a Bayesian counter part of Quantile Regression. Such model could give the upper limit at N_s=0 by getting the corresponding quantiles.

    [After talking to Vinay] I misunderstood the objectivity of Vinay’s question. But, I hope Poisson Quantiles may assist low count data analysis.

    10-03-2007, 3:37 pm
Leave a comment