Archive for the ‘Jargon’ Category.

Line Emission [EotW]

Spectral lines are a ubiquitous feature of astronomical data. This week, we explore the special case of optically thin emission from low-density and high-temperature plasma, and consider the component factors that determine the line intensity. Continue reading ‘Line Emission [EotW]’ »

Equation of the Week: Confronting data with model

Starting a new feature — highlighting some equation that is widely used in astrophysics or astrostatistics. Today’s featured equation: what instruments do to incident photons. Continue reading ‘Equation of the Week: Confronting data with model’ »

[ArXiv] 1st week, Apr. 2008

I’m very curious how astronomers began to use Monte Carlo Markov Chain instead of Markov chain Monte Carlo. The more it becomes popular, the more frequently Monte Carlo Markov Chain appears. Anyway, this week, I added non astrostatistical papers in the list: a tutorial, big bang, and biblical theology. Continue reading ‘[ArXiv] 1st week, Apr. 2008’ »

[ArXiv]4th week, Mar. 2008

The numbers of astro-ph preprints on average have been decreased so as my hours of reading abstracts…. cool!!! By the way, there is a paper about solar cycle, PCA, ICA, and Lomb-Scargle periodogram. Continue reading ‘[ArXiv]4th week, Mar. 2008’ »

[ArXiv] 3rd week, Mar. 2007

Markov chain Monte Carlo (MCMC) never misses a week from recently astro-ph. A book titled MCMC in astronomy will be a best seller. There are, in addition, very interesting non MCMC preprints. Continue reading ‘[ArXiv] 3rd week, Mar. 2007’ »

Eddington versus Malmquist

During the runup to his recent talk on logN-logS, Andreas mentioned how sometimes people are confused about the variety of statistical biases that afflict surveys. They usually know what the biases are, but often tend to mislabel them, especially the Eddington and Malmquist types. Sort of like using “your” and “you’re” interchangeably, which to me is like nails on a blackboard. So here’s a brief summary: Continue reading ‘Eddington versus Malmquist’ »

[ArXiv] 1st week, Mar. 2008

Irrelevant to astrostatistics but interesting for baseball lovers.
    [stat.AP:0802.4317] Jensen, Shirley, & Wyner
    Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball

With the 5th year WMAP data release, there were many WMAP related papers and among them, most statistical papers are listed. Continue reading ‘[ArXiv] 1st week, Mar. 2008’ »

Dance of the Errors

One of the big problems that has come up in recent years is in how to represent the uncertainty in certain estimates. Astronomers usually present errors as +-stddev on the quantities of interest, but that presupposes that the errors are uncorrelated. But suppose you are estimating a multi-dimensional set of parameters that may have large correlations amongst themselves? One such case is that of Differential Emission Measures (DEM), where the “quantity of emission” from a plasma (loosely, how much stuff there is available to emit — it is the product of the volume and the densities of electrons and H) is estimated for different temperatures. See the plots at the PoA DEM tutorial for examples of how we are currently trying to visualize the error bars. Another example is the correlated systematic uncertainties in effective areas (Drake et al., 2005, Chandra Cal Workshop). This is not dissimilar to the problem of determining the significance of a “feature” in an image (Connors, A. & van Dyk, D.A., 2007, SCMA IV). Continue reading ‘Dance of the Errors’ »

~ Avalanche(a,b)

Avalanches are a common process, occuring anywhere that a system can store stress temporarily without “snapping”. It can happen on sand dunes and solar flares as easily as on the snow bound Alps.

Melatos, Peralta, & Wyithe (arXiv:0710.1021) have a nice summary of avalanche processes in the context of pulsar glitches. Their primary purpose is to show that the glitches are indeed consistent with an avalanche, and along the way they give a highly readable description of what an avalanche is and what it entails. Briefly, avalanches result in event parameters that are distributed in scale invariant fashion (read: power laws) with exponential waiting time distributions (i.e., Poisson).

Hence the title of this post: the “Avalanche distribution” (indulge me! I’m using stats notation to bury complications!) can be thought to have two parameters, both describing the indices of power-law distributions that control the event sizes, a, and the event durations, b, and where the event separations are distributed as an exponential decay. Is there a canned statistical distribution that describes all this already? (In our work modeling stellar flares, we assumed that b=0 and found that a>2 a<-2, which has all sorts of nice consequences for coronal heating processes.)

model vs model

As Alanna pointed out, astronomers and statisticians mean different things when they say “model”. To complicate matters, we have also started to use another term called “data model”. Continue reading ‘model vs model’ »

ab posteriori ad priori

A great advantage of Bayesian analysis, they say, is the ability to propagate the posterior. That is, if we derive a posterior probability distribution function for a parameter using one dataset, we can apply that as the prior when a new dataset comes along, and thereby improve our estimates of the parameter and shrink the error bars.

But how exactly does it work? I asked this of Tom Loredo in the context of some strange behavior of sequential applications of BEHR that Ian Evans had noticed (specifically that sequential applications of BEHR, using as prior the posterior from the preceding dataset, seemed to be dependent on the order in which the datasets were considered (which, as it happens, arose from approximating the posterior distribution before passing it on as the prior distribution to the next stage — a feature that now has been corrected)), and this is what he said:

Yes, this is a simple theorem. Suppose you have two data sets, D1 and D2, hypotheses H, and background info (model, etc.) I. Considering D2 to be the new piece of info, Bayes’s theorem is:

[1]

p(H|D1,D2) = p(H|D1) p(D2|H, D1)            ||  I
             -------------------
                    p(D2|D1)

where the “|| I” on the right is the “Skilling conditional” indicating that all the probabilities share an “I” on the right of the conditioning solidus (in fact, they also share a D1).

We can instead consider D1 to be the new piece of info; BT then reads:

[2]

p(H|D1,D2) = p(H|D2) p(D1|H, D2)            ||  I
             -------------------
                    p(D1|D2)

Now go back to [1], and use BT on the p(H|D1) factor:

p(H|D1,D2) = p(H) p(D1|H) p(D2|H, D1)            ||  I
             ------------------------
                    p(D1) p(D2|D1)

           = p(H, D1, D2)
             ------------      (by the product rule)
                p(D1,D2)

Do the same to [2]: use BT on the p(H|D2) factor:

p(H|D1,D2) = p(H) p(D2|H) p(D1|H, D2)            ||  I
             ------------------------
                    p(D2) p(D1|D2)

           = p(H, D1, D2)
             ------------      (by the product rule)
                p(D1,D2)

So the results from the two orderings are the same. In fact, in the Cox-Jaynes approach, the “axioms” of probability aren’t axioms, but get derived from desiderata that guarantee this kind of internal consistency of one’s calculations. So this is a very fundamental symmetry.

Note that you have to worry about possible dependence between the data (i.e., p(D2|H, D1) appears in [1], not just p(D2|H)). In practice, separate data are often independent (conditional on H), so p(D2|H, D1) = p(D2|H) (i.e., if you consider H as specified, then D1 tells you nothing about D2 that you don’t already know from H). This is the case, e.g., for basic iid normal data, or Poisson counts. But even in these cases dependences might arise, e.g., if there are nuisance parameters that are common for the two data sets (if you try to combine the info by multiplying *marginalized* posteriors, you may get into trouble; you may need to marginalize *after* multiplying if nuisance parameters are shared, or account for dependence some other way).

what if you had 3, 4, .. N observations? Does the order in which you apply BT affect the results?

No, as long as you use BT correctly and don’t ignore any dependences that might arise.

if not, is there a prescription on what is the Right Thing [TM] to do?

Always obey the laws of probability theory! 9-)

P Values: What They Are and How to Use Them

After observing the recent discussion among CHASC, the following paper
P Values: What They Are and How to Use Them by Luc Demortier emerged from my mind.
Continue reading ‘P Values: What They Are and How to Use Them’ »

When you observed zero counts, you didn’t not observe any counts

Dong-Woo, who has been playing with BEHR, noticed that the confidence bounds quoted on the source intensities seem to be unchanged when the source counts are zero, regardless of what the background counts are set to. That is, p(s|NS,NB) is invariant when NS=0, for any value of NB. This seems a bit odd, because [naively] one expects that as NB increases, it should/ought to get more and more likely that s gets closer to 0. Continue reading ‘When you observed zero counts, you didn’t not observe any counts’ »

[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007

From arxiv/astro-ph:0708.4030v1
Deep ACS Imaging in the Globular Cluster NGC 6397: The Cluster Color Magnitude Diagram and Luminosity Function by H.B. Richer et.al

This paper presented an observational study of a globular cluster, named NGC 6397, enhanced and more informative compared to previous observations in a sense that 1) a truncation in the white dwarf cooling sequence occurs at 28 magnitude, 2) the cluster main sequence seems to terminate approximately at the hydrogen-burning limit predicted by two independent stellar evolution models, and 3) luminosity functions (LFs) or mass functions (MFs) are well defined. Nothing statistical, but the idea of defining color magnitude diagrams (CMDs) and LFs described in the paper, will assist developing suitable statistics on CMD and LF fitting problems in addition to the improved measurements (ACS imaging) of stars in NGC 6397.
Continue reading ‘[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007’ »

[ArXiv] A Lecture Note, June 17, 2007

From arxiv/astro-ph:0706.1988,
Lectures on Astronomy, Astrophysics, and Cosmology looks helpful to statisticians who like to know astronomy, astrophysics, and cosmology. The lecture note starts from introducing fundamentals of astronomy, UNITS!!!, and its history. It also explains astronomical measures such as distances and their units, luminosity, and temperature; HR diagram (astronomers’ summary diagram); stellar evolution; and relevant topics in cosmology. At least, a third of the article will be useful to grasp a rough idea of astronomy as a scientific subject beyond colorful pictures. Statisticians who are keen to cosmology are recommended to read beyond.

This is not a high energy lecture note; therefore, statisticians interested in high energy are encouraged to visit Astro Jargon for Statisticians and CHASC.