A History of Markov Chain Monte Carlo

I've been joking about the astronomers' fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv:

Parametric Bootstrap vs. Nonparametric Bootstrap

The following footnotes are from one of Prof. Babu’s slides but I do not recall which occasion he presented the content.

– In the XSPEC packages, the parametric bootstrap is command FAKEIT, which makes Monte Carlo simulation of specified spectral model.
– XSPEC does not provide a nonparametric bootstrap capability.

Continue reading ‘Parametric Bootstrap vs. Nonparametric Bootstrap’ »

Why Gaussianity?

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

A Conversation with Peter Huber

The problem with data analysis is of course that it is a performing art. It is not something you easily write a paper on; rather, it is something you do. And so it is difficult to publish.

quoted from this conversation Continue reading ‘A Conversation with Peter Huber’ »

An anecdote on entrophy

My greatest concern was what to call it. I thought of calling it “information”, but the word was overly used, so I decided to call it “uncertainty”. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”

Continue reading ‘An anecdote on entrophy’ »


The whole story can be found from the page 8 of IMS Bulletin, Vol.37 Issue 7. (click for the pdf file)

I Like Eq

I grew up in an environment that glamourized mathematical equations. Equations adorned a text like jewelry, set there to dazzle, and often to outshine the text that they were to illuminate. Needless to say, anything I wrote was dense, opaque, and didn't communicate what it set out to. It was not until I saw a Reference Frame essay by David Mermin on how to write equations (1989, Physics Today, 42, p9) that I realized that equations should be treated as part of the text. You should be able to read them. David Mermin set out 3 rules for writing out equations, which I've tried to follow diligently (if not always successfully) since then.

This week’s quote:

“It’s easy to get a good fit, which means that your fit doesn’t mean much…”

Ariane Lancon (from proceedings of “Starbursts: from 30 Doradus to Lyman break galaxies”, 2005)


Prof. Speed writes columns for IMS Bulletin and the April 2008 issue has Terence’s Stuff: PCA (p.9). Here are quotes with minor paraphrasing:

Although a quintessentially statistical notion, my impression is that PCA has always been more popular with non-statisticians. Of course we love to prove its optimality properties in our courses, and at one time the distribution theory of sample covariance matrices was heavily studied.

…but who could not feel suspicious when observing the explosive growth in the use of PCA in the biological and physical sciences and engineering, not to mention economics?…it became the analysis tool of choice of the hordes of former physicists, chemists and mathematicians who unwittingly found themselves having to be statisticians in the computer age.

My initial theory for its popularity was simply that they were in love with the prefix eigen-, and felt that anything involving it acquired the cachet of quantum mechanics, where, you will recall, everything important has that prefix.

He gave the following eigen-’s: eigengenes, eigenarrays, eigenexpression, eigenproteins, eigenprofiles, eigenpathways, eigenSNPs, eigenimages, eigenfaces, eigenpatterns, eigenresult, and even eigenGoogle.

How many miracles must one witness before becoming a convert?…Well, I’ve seen my three miracles of exploratory data analysis, examples where I found I had a problem, and could do something about it using PCA, so now I’m a believer.

No need to mention that astronomers explore data with PCA and utilize eigen- values and vectors to transform raw data into more interpretable ones.

Lomb-Scargle periodograms in bioinformatics

A statistical method developed by insightful and brilliant astronomers is used in bioinformatics:
Detecting periodic patterns in unevenly spaced gene expression time series using Lomb–Scargle periodograms
by Glynn, Chen, & Mushegian [Click for R code and relevant information] [Paper archive at Bioinformatics]

The conclusion clearly indicates the winning points of the Lomb-Scargle periodograms.

The Lomb-Scargle periodogram algorithm is an effective tool for finding periodic gene expression profiles in microarray data, especially when data may be collected at arbitrary time points or when a significant proportion of data is missing.

My personal wish is that data driven statistical methods by hands on scientists (and their statistical collaborators) are to be used in other disciplines because I believe data sets are likely to share the unknown truth of our one universe.

Kepler and the Art of Astrophysical Inference

I recently discovered iTunesU, and I have to confess, I find it utterly fascinating. By golly, it is everything that they promised us that the internet would be. Informative, entertaining, and educational. What are the odds?!? Anyway, while poking around the myriad lectures, courses, and talks that are now online, I came across a popular Physics lecture series at UMichigan which listed a talk by one of my favorite speakers, Owen Gingerich. He had spoken about The Four Myths of the Copernican Revolution last November. It was, how shall we say, riveting.

Owen talks in detail about how the Copernican model came to supplant the Ptolemaic model. In particular, he describes how Kepler went from Ptolemaic epicycles to elliptical orbits. Contrary to general impression, Kepler did not fit ellipses to Tycho Brahe’s observations of Mars. The ellipticity is far too small for it to be fittable! But rather, he used logical reasoning to first offset Earth’s epicyle away from the center in order to avoid the so-called Martian Catastrophe, and then used the phenomenological constraint of the law of equal areas to infer that the path must be an ellipse.

This process, along with Galileo’s advocacy for the heliocentric system, demonstrates a telling fact about how Astrophysics is done in practice. Hyunsook once lamented that astronomers seem to be rather trigger happy with correlations and regressions, and everyone knows they don’t constitute proof of anything, so why do they do it? Owen says about 39 1/2 minutes into the lecture:

Here we have the fourth of the myths, that Galileo’s telescopic observations finally proved the motion of the earth and thereby, at last, established the truth of the Copernican system.

What I want to assure you is that, in general, science does not operate by proofs. You hear that an awful lot, about science looking for propositions that can be falsified, that proof plays this big role.. uh-uh. It is coherence of explanation, understanding things that are well-knit together; the broader the framework of knitting the things together, the more we are able to believe it.

Exactly! We build models, often with little justification in terms of experimental proof, and muddle along trying to make it fit into a coherent narrative. This is why statistics is looked upon with suspicion among astronomers, and why for centuries our mantra has been “if it takes statistics to prove it, it isn’t real!”

Quote of the Date

Really, there is no point in extracting a sentence here and there, go read the whole thing:

Why I don’t like Bayesian Statistics

- Andrew Gelman

Oh, alright, here’s one:

I can’t keep track of what all those Bayesians are doing nowadays–unfortunately, all sorts of people are being seduced by the promises of automatic inference through the “magic of MCMC”–but I wish they would all just stop already and get back to doing statistics the way it should be done, back in the old days when a p-value stood for something, when a confidence interval meant what it said, and statistical bias was something to eliminate, not something to embrace.

Continue reading ‘Quote of the Date’ »

Statistics is the study of uncertainty

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from Continue reading ‘Statistics is the study of uncertainty’ »

A quote on data analysis

Same data, different authors, different results.

(Marco Sirianni, from a conference on starbursts).

[Quote] When all the models are wrong

From page 103 of Bayesian Model Selection and Model Averaging by L. Wasserman (2000) Journal of Mathematical Psychology, 44, pp.92-107