Archive for September 2007

#### [ArXiv] 4th week, Sept. 2007

A few papers from astro-ph may drag statisticians’ attention and a statistical paper may be helpful for astronomers, keen on confidence intervals utilizing prior information.
Continue reading ‘[ArXiv] 4th week, Sept. 2007’ »

A great advantage of Bayesian analysis, they say, is the ability to propagate the posterior. That is, if we derive a posterior probability distribution function for a parameter using one dataset, we can apply that as the prior when a new dataset comes along, and thereby improve our estimates of the parameter and shrink the error bars.

But how exactly does it work? I asked this of Tom Loredo in the context of some strange behavior of sequential applications of BEHR that Ian Evans had noticed (specifically that sequential applications of BEHR, using as prior the posterior from the preceding dataset, seemed to be dependent on the order in which the datasets were considered (which, as it happens, arose from approximating the posterior distribution before passing it on as the prior distribution to the next stage — a feature that now has been corrected)), and this is what he said:

Yes, this is a simple theorem. Suppose you have two data sets, D1 and D2, hypotheses H, and background info (model, etc.) I. Considering D2 to be the new piece of info, Bayes’s theorem is:

[1]

```p(H|D1,D2) = p(H|D1) p(D2|H, D1)            ||  I
-------------------
p(D2|D1)```

where the “|| I” on the right is the “Skilling conditional” indicating that all the probabilities share an “I” on the right of the conditioning solidus (in fact, they also share a D1).

We can instead consider D1 to be the new piece of info; BT then reads:

[2]

```p(H|D1,D2) = p(H|D2) p(D1|H, D2)            ||  I
-------------------
p(D1|D2)```

Now go back to [1], and use BT on the p(H|D1) factor:

```p(H|D1,D2) = p(H) p(D1|H) p(D2|H, D1)            ||  I
------------------------
p(D1) p(D2|D1)

= p(H, D1, D2)
------------      (by the product rule)
p(D1,D2)```

Do the same to [2]: use BT on the p(H|D2) factor:

```p(H|D1,D2) = p(H) p(D2|H) p(D1|H, D2)            ||  I
------------------------
p(D2) p(D1|D2)

= p(H, D1, D2)
------------      (by the product rule)
p(D1,D2)```

So the results from the two orderings are the same. In fact, in the Cox-Jaynes approach, the “axioms” of probability aren’t axioms, but get derived from desiderata that guarantee this kind of internal consistency of one’s calculations. So this is a very fundamental symmetry.

Note that you have to worry about possible dependence between the data (i.e., p(D2|H, D1) appears in [1], not just p(D2|H)). In practice, separate data are often independent (conditional on H), so p(D2|H, D1) = p(D2|H) (i.e., if you consider H as specified, then D1 tells you nothing about D2 that you don’t already know from H). This is the case, e.g., for basic iid normal data, or Poisson counts. But even in these cases dependences might arise, e.g., if there are nuisance parameters that are common for the two data sets (if you try to combine the info by multiplying *marginalized* posteriors, you may get into trouble; you may need to marginalize *after* multiplying if nuisance parameters are shared, or account for dependence some other way).

what if you had 3, 4, .. N observations? Does the order in which you apply BT affect the results?

No, as long as you use BT correctly and don’t ignore any dependences that might arise.

if not, is there a prescription on what is the Right Thing [TM] to do?

Always obey the laws of probability theory! 9-)

#### P Values: What They Are and How to Use Them

After observing the recent discussion among CHASC, the following paper
P Values: What They Are and How to Use Them by Luc Demortier emerged from my mind.
Continue reading ‘P Values: What They Are and How to Use Them’ »

#### [ArXiv] 3rd week, Sept. 2007

In addition to Short Timescale Coronal Variability in Capella [astro-ph:0709.3093], there were a few statistically interesting preprints came during the 3rd week of Sept. Continue reading ‘[ArXiv] 3rd week, Sept. 2007’ »

#### When you observed zero counts, you didn’t not observe any counts

Dong-Woo, who has been playing with BEHR, noticed that the confidence bounds quoted on the source intensities seem to be unchanged when the source counts are zero, regardless of what the background counts are set to. That is, p(s|NS,NB) is invariant when NS=0, for any value of NB. This seems a bit odd, because [naively] one expects that as NB increases, it should/ought to get more and more likely that s gets closer to 0. Continue reading ‘When you observed zero counts, you didn’t not observe any counts’ »

[arXiv:0709.3093v1] Short Timescale Coronal Variability in Capella (Kashyap & Posson-Brown)

We recently submitted that paper to AJ, and rather ironically, I did the analysis during the same time frame as this discussion was going on, about how astronomers cannot rely on repeating observations. Ironic because the result reported there hinges on the existence of small, but persistent signal that is found in repeated observations of the same source. Doubly ironic in fact, in that just as we were backing and forthing about cultural differences I seemed to have gone and done something completely contrary to my heritage! Continue reading ‘Betraying your heritage’ »

#### Spurious Sources

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, “false sources” are generally in the domain of Type II Type I errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real? Continue reading ‘Spurious Sources’ »

#### VOConvert (ConVOT)

VOConvert or ConVOT is a small java script which does file format conversion from fits to ascii or the other way around. These tools might be useful for statisticians who want to convert astronomers’ data format called fits into ascii quickly for a statistical analysis. Additionally, VOConvert creates an interim output for VOStat, designed for statistical data analysis from Virtual Observatory. The softwares and the list of Virtual Observatories around the world can be found at Virtual Observatory India. Please, check a link in VOstat (http://hea-www.harvard.edu/AstroStat/slog/2007/vostat) for more information about VOstat.

#### PHYSTAT-LHC 2007

The idea that some useful materials related to the Chandra calibration problem, which CHASC is putting an effort to, could be found from PHYSTAT conferences came along. Owing to the recent advanced technologies adopted by physicists (I haven’t seen any statistical conference offers what I obtained from PHYSTAT-LHC 2007), I had a chance to go through some video files from PHYSTAT-LHC 2007. The files are the recorded lectures and lecture notes. They are available from PHYSTAT-LHC 2007 Program.

#### How to subscribe to the arXiv email list service

Over the years, I noticed the exponential increase of statistical applications from astronomical papers. Keeping the track of them and writing summaries based on daily arXiv updates for the slog over the past few months has become a quite overwhelming task for a single person. Therefore, instead of offering fish, I decide to offer how to catch fish.
Continue reading ‘How to subscribe to the arXiv email list service’ »

#### Visualizing Astronomy

The CXC Education & Outreach Program at the CfA hosts a series of lectures on Visualizing Astronomy, and the first of this season’s is scheduled for Sep 18 at 1:30pm at Phillips:

Date & Time: Tuesday, September 18, 1:30pm
Location: Phillips Auditorium
Speaker: Alyssa Goodman (Harvard)
Title: Amazing New Tools for Exploring Astronomical Data

#### [ArXiv] SVM and galaxy morphological classification, Sept. 10, 2007

From arxiv/astro-ph:0709.1359,
A robust morphological classification of high-redshift galaxies using support vector machines on seeing limited images. I Method description by M. Huertas-Company et al.

Machine learning and statistical learning become more and more popular in astronomy. Artificial Neural Network (ANN) and Support Vector Machine (SVM) are hardly missed when classifying on massive survey data is the objective. The authors provide a gentle tutorial on SVM for galactic morphological classification. Their source code GALSVM is linked for the interested readers.
Continue reading ‘[ArXiv] SVM and galaxy morphological classification, Sept. 10, 2007’ »

#### [ArXiv] Bimodal Color Distribution in GCS, Sept. 7, 2007

From arxiv/astro-ph:0709.1073v1
On the Metallicity-Color Relations and Bimodal Color Distributions in Extragalactic Globular Cluster Systems by M. Cantiello and J. P. Blakeslee

Many observations on globular cluster systems (GCS) show bimodal distributions in color and metallicity space. The authors discussed the complication of non-linear metalicity and color relations and presented their careful study to suggest the optimal color(s) for revealing the presence of real bimodal GC metallicity distributions. Based on their simulation study, (V-H) and (V-K) are confirmed to be good colors for revealing unbiased bimodal metallicity distributions in GCS.
Continue reading ‘[ArXiv] Bimodal Color Distribution in GCS, Sept. 7, 2007’ »

#### [ArXiv] Recent bayesian studies from astro-ph

In the past month, I’ve noticed relatively frequent paper appearance in arxiv/astro-ph whose title includes Bayesian or Markov Chain Monte Carlo (MCMC). Those papers are:

• [astro-ph:0709.1058v1] Joint Bayesian Component Separation and CMB Power Spectrum Estimation by H.K.Eriksen et. al.
• [astro-ph:0709.1104v1] Monolithic or hierarchical star formation? A new statistical analysis by M. Kampakoglou, R. Trotta, and J. Silk
• [astro-ph:0411573v2] A Bayesian analysis of the primordial power spectrum by M.Bridges, A.N.Lasenby, M.P.Hobson
• [astro-ph:0709.0596v1] Bayesian inversion of Stokes profiles by A. A. Ramos, M.J.M. Gonzales, and J.A. Rubino-Martin
• [astro-ph:0709.0711v1] Bayesian posterior classification of planetary nebulae according to the Peimbert types by C. Quireza, H.J.Rocha-Pinto, and W.J. Maciel
• [astro-ph:0708.2340v1] Bayesian Galaxy Shape Measurement for Weak Lensing Surveys -I. Methodology and a Fast Fitting Algorithm by L. Miller et. al.
• [astro-ph:0708.1871v1] Dark energy and cosmic curvature: Monte-Carlo Markov Chain approach by Y. Gong et. al.