Archive for the ‘Bayesian’ Category.

A lecture note of great utility

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract: Continue reading ‘A lecture note of great utility’ »

Background Subtraction, the Sequel [Eqn]

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data. Continue reading ‘Background Subtraction, the Sequel [Eqn]’ »

[ArXiv] 4th week, May 2008

Eight astro-ph papers and two statistics paper are listed this week. One statistics paper discusses detecting filaments and the other talks about maximum likelihood estimation of satellite images (clouds). Continue reading ‘[ArXiv] 4th week, May 2008’ »

[ArXiv] 3rd week, May 2008

Not many this week, but there’s a great read. Continue reading ‘[ArXiv] 3rd week, May 2008’ »

[ArXiv] 2nd week, Apr. 2008

Markov chain Monte Carlo became the most frequent and stable statistical application in astronomy. It will be useful collecting tutorials from both professions. Continue reading ‘[ArXiv] 2nd week, Apr. 2008’ »

Quote of the Date

Really, there is no point in extracting a sentence here and there, go read the whole thing:

Why I don’t like Bayesian Statistics

- Andrew Gelman

Oh, alright, here’s one:

I can’t keep track of what all those Bayesians are doing nowadays–unfortunately, all sorts of people are being seduced by the promises of automatic inference through the “magic of MCMC”–but I wish they would all just stop already and get back to doing statistics the way it should be done, back in the old days when a p-value stood for something, when a confidence interval meant what it said, and statistical bias was something to eliminate, not something to embrace.

Continue reading ‘Quote of the Date’ »

Statistics is the study of uncertainty

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from Continue reading ‘Statistics is the study of uncertainty’ »

[ArXiv] 1st week, Mar. 2008

Irrelevant to astrostatistics but interesting for baseball lovers.
    [stat.AP:0802.4317] Jensen, Shirley, & Wyner
    Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball

With the 5th year WMAP data release, there were many WMAP related papers and among them, most statistical papers are listed. Continue reading ‘[ArXiv] 1st week, Mar. 2008’ »

[ArXiv] A fast Bayesian object detection

This is a quite long paper that I separated from [Arvix] 4th week, Feb. 2008:
      [astro-ph:0802.3916] P. Carvalho, G. Rocha, & M.P.Hobso
      A fast Bayesian approach to discrete object detection in astronomical datasets – PowellSnakes I
As the title suggests, it describes Bayesian source detection and provides me a chance to learn the foundation of source detection in astronomy. Continue reading ‘[ArXiv] A fast Bayesian object detection’ »

Signal Processing and Bootstrap

Astronomers have developed their ways of processing signals almost independent to but sometimes collaboratively with engineers, although the fundamental of signal processing is same: extracting information. Doubtlessly, these two parallel roads of astronomers’ and engineers’ have been pointing opposite directions: one toward the sky and the other to the earth. Nevertheless, without an intensive argument, we could say that somewhat statistics has played the medium of signal processing for both scientists and engineers. This particular issue of IEEE signal processing magazine may shed lights for astronomers interested in signal processing and statistics outside the astronomical society.

IEEE Signal Processing Magazine Jul. 2007 Vol 24 Issue 4: Bootstrap methods in signal processing

This link will show the table of contents and provide links to articles; however, the access to papers requires IEEE Xplore subscription via libraries or individual IEEE memberships). Here, I’d like to attempt to introduce some articles and tutorials.
Continue reading ‘Signal Processing and Bootstrap’ »

you are biased, I have an informative prior”

Hyunsook drew attention to this paper (arXiv:0709.4531v1) by Brad Schaefer on the underdispersed measurements of the distances to LMC. He makes a compelling case that since 2002 published numbers in the literature have been hewing to an “acceptable number”, possibly in an unconscious effort to pass muster with their referees. Essentially, the distribution of the best-fit distances are much more closely clustered than you would expect from the quoted sizes of the error bars. Continue reading ‘“you are biased, I have an informative prior”’ »

Implement Bayesian inference using PHP

Not knowing much about java and java applets in a software development and its web/internet publicizing, I cannot comment what is more efficient. Nevertheless, I thought that PHP would do the similar job in a simpler fashion and the followings may provide some ideas and solutions for publicizing statistical methods through websites based on Bayesian Inference.
Continue reading ‘Implement Bayesian inference using PHP’ »

Provocative Corollary to Andrew Gelman’s Folk Theorem

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.

Quote of the Week, October 3, 2007

From the ever-quotable Andrew Gelman comes this gem, which he calls a Folk Theorem :

When things are hard to compute, often the model doesn’t fit the data. Difficulties in computation are therefore often model problems… [When the computation isn't working] we have the duty and freedom to think about models.

Continue reading ‘Quote of the Week, October 3, 2007’ »

ab posteriori ad priori

A great advantage of Bayesian analysis, they say, is the ability to propagate the posterior. That is, if we derive a posterior probability distribution function for a parameter using one dataset, we can apply that as the prior when a new dataset comes along, and thereby improve our estimates of the parameter and shrink the error bars.

But how exactly does it work? I asked this of Tom Loredo in the context of some strange behavior of sequential applications of BEHR that Ian Evans had noticed (specifically that sequential applications of BEHR, using as prior the posterior from the preceding dataset, seemed to be dependent on the order in which the datasets were considered (which, as it happens, arose from approximating the posterior distribution before passing it on as the prior distribution to the next stage — a feature that now has been corrected)), and this is what he said:

Yes, this is a simple theorem. Suppose you have two data sets, D1 and D2, hypotheses H, and background info (model, etc.) I. Considering D2 to be the new piece of info, Bayes’s theorem is:

[1]

p(H|D1,D2) = p(H|D1) p(D2|H, D1)            ||  I
             -------------------
                    p(D2|D1)

where the “|| I” on the right is the “Skilling conditional” indicating that all the probabilities share an “I” on the right of the conditioning solidus (in fact, they also share a D1).

We can instead consider D1 to be the new piece of info; BT then reads:

[2]

p(H|D1,D2) = p(H|D2) p(D1|H, D2)            ||  I
             -------------------
                    p(D1|D2)

Now go back to [1], and use BT on the p(H|D1) factor:

p(H|D1,D2) = p(H) p(D1|H) p(D2|H, D1)            ||  I
             ------------------------
                    p(D1) p(D2|D1)

           = p(H, D1, D2)
             ------------      (by the product rule)
                p(D1,D2)

Do the same to [2]: use BT on the p(H|D2) factor:

p(H|D1,D2) = p(H) p(D2|H) p(D1|H, D2)            ||  I
             ------------------------
                    p(D2) p(D1|D2)

           = p(H, D1, D2)
             ------------      (by the product rule)
                p(D1,D2)

So the results from the two orderings are the same. In fact, in the Cox-Jaynes approach, the “axioms” of probability aren’t axioms, but get derived from desiderata that guarantee this kind of internal consistency of one’s calculations. So this is a very fundamental symmetry.

Note that you have to worry about possible dependence between the data (i.e., p(D2|H, D1) appears in [1], not just p(D2|H)). In practice, separate data are often independent (conditional on H), so p(D2|H, D1) = p(D2|H) (i.e., if you consider H as specified, then D1 tells you nothing about D2 that you don’t already know from H). This is the case, e.g., for basic iid normal data, or Poisson counts. But even in these cases dependences might arise, e.g., if there are nuisance parameters that are common for the two data sets (if you try to combine the info by multiplying *marginalized* posteriors, you may get into trouble; you may need to marginalize *after* multiplying if nuisance parameters are shared, or account for dependence some other way).

what if you had 3, 4, .. N observations? Does the order in which you apply BT affect the results?

No, as long as you use BT correctly and don’t ignore any dependences that might arise.

if not, is there a prescription on what is the Right Thing [TM] to do?

Always obey the laws of probability theory! 9-)