The AstroStat Slog » history

Guinness, Gosset, Fisher, and Small Samples

hlee — Thu, 12 Feb 2009 18:03:01 +0000

Student’s t-distribution is somewhat underrepresented in the astronomical community. Having an article with nice stories, it looks to me the best way to introduce the t distribution. This article describing historic anecdotes about monumental statistical developments occurred about 100 years ago.

Guinness, Gosset, Fisher, and Small Samples by Joan Fisher Box
Source: Statist. Sci. Volume 2, Number 1 (1987), 45-52.

No time for reading the whole article? I hope you have a few minutes to read following quotes, which are quite enchanting to me.

[p.45] One of the first things you learn in statistics is to distinguish between the true parameter value of the standard deviation σ and the sample standard deviation s. But at the turn of the century statisticians did not. They called both σ and s the standard deviation. They always used such large samples that their estimate really did approximate the parameter value, so it did not make much difference to their results. But their methods would not do for experimental work. You cannot get samples of thousands of experimental points. …

[p.49] …, the main question was exactly how much wider should the error limits be to make allowance for the error introduced by using the estimates m and s instead of the parameters μ and σ. Pearson could not answer that question for Gosset in 1905, nor the one that followed, which was: what level of probability should be called significant?

[p.49] …, Gosset worked out the exact answer to his question about the probable error of the mean and tabulated the probability values of his criterion z=(m-μ)/s for samples of N=2,3,…,10. He tried also to calculate the distribution of the correlation coefficient by the same method but managed to get the answer only for the case when the true correlation is zero. …

“Thanks to Henrietta Leavitt”

vlk — Thu, 06 Nov 2008 10:00:17 +0000

[9/30/2008]

The CfA is celebrating the 100th anniversary of the discovery of the Cepheid period-luminosity relation on Nov 6, 2008. See http://www.cfa.harvard.edu/events/2008/leavitt/ for details.

[Update 10/03] For a nice introduction to the story of Henrietta Swan Leavitt, listen to this Perimeter Institute talk by George Johnson: http://pirsa.org/06050003/

[Update 11/06] The full program is now available. The symposium begins at Noon today.

Quintessential Contributions

hlee — Sat, 27 Sep 2008 03:49:34 +0000

To my personal thoughts, the history of astronomy is more interesting than the history of statistics. This may change tomorrow. Harvard statistics department (chair Xiao-Li Meng) organizes a symposium titled

Quintessential Contributions:
Celebrating Major Birthdays of Statistical Ideas and Their Inventors
When: Saturday, September 27, 2008, 9:45 AM – 5:00 PM
Where: Radcliffe Gymnasium, 18 Mason Street, Cambridge, MA

This symposium features four distinguished speakers who will talk about four most celebrated statistical researches of four most renown statisticians. Click here for the details.

The contents are only spanned about 100 years and there are great chances that my mind still favors the history of astronomy over the history of statistics. However, there will be another presentation by Prof. Stigler on Monday (Sept. 29th) at the statistics department (click here for a pdf flyer) titled The Five Most Consequential Ideas in the History of Statistics and the last sentence “And, no, Bayes Theorem is not in the list.” in the abstract intrigues and tempts me to change my mind.

I’d like to share the information of this highly anticipated symposium and colloquium with you particularly with those who live in/near Cambridge.

A History of Markov Chain Monte Carlo

hlee — Wed, 17 Sep 2008 18:11:01 +0000

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv:

[stat.CO:0808.2902] A History of Markov Chain Monte Carlo–Subjective Recollections from Incomplete Data– by C. Robert and G. Casella
Abstract: In this note we attempt to trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940′s through its use today. We see how the earlier stages of the Monte Carlo (MC, not MCMC) research have led to the algorithms currently in use. More importantly, we see how the development of this methodology has not only changed our solutions to problems, but has changed the way we think about problems.

Here is the year list of monumental advances in the MCMC history,

1946: ENIAC
late 1940′s: inception along with Monte Carlo methods.
1953: Metropolis algorithm published in Journal of Chemical Physics (Metropolis et al.)
1970: Hastings algorithms in Biometrika (Hastrings)
1974: Gibbs sampler and Hammersley-Clifford theorem paper by Besag and its discussion by Hammersley in JRSSS B
1977: EM algorithm in JRSSS B (Dempster et al)
1983: Simulated Annealing algorithm (Kirkpatrick et al.)
1984: Gibbs sampling in IEEE Trans. Pattern Anal. Mach. Intell. (Geman and Geman, this paper is responsible for the name)
1987: data augmentation in JASA (Tanner and Wong)
1980s: image analysis and spatial statistics enjoyed MCMC algorithms, not popular with others due to the lack of computing power
1990: seminal paper by Gelfand and Smith in JSAS
1991: BUGS was presented at the Valencia meeting
1992: introductory paper by Casella and Georgy
1994: influential MCMC theory paper by Tierney in Ann. Stat.
1995: reversible jump algorithm in Biometrika (Green)
mid 1990′s: boom of MCMC due to particle filters, reversible jump and perfect sampling (second-generation of MCMC revolution)

and a nice quote from conclusion.

MCMC changed out emphasis from “closed form” solutions to algorithms, expanded our immpact to solving “real” applied problems, expanded our impact to improving numerical algorithms using statistical ideas, and led us into a world where “exact” now means “simulated”!

If you consider applying MCMC methods in your data analysis, references listed in Robert and Casella serve as a good starting point.

A Conversation with Peter Huber

hlee — Sat, 06 Sep 2008 00:46:59 +0000

The problem with data analysis is of course that it is a performing art. It is not something you easily write a paper on; rather, it is something you do. And so it is difficult to publish.

quoted from this conversation ——————————————————-

Statistical Science has a nice “conversations” series with renown statisticians. This series always benefits me because of 1. learning the history of statistics through a personal life, 2. confronting various aspects in statistics as many statisticians as were interviewed, and 3. acquiring an introductory education in the statistics that those interviewees have perfected over many years in a plain language. One post in the slog from this series was a conversation with Leo Breiman about the two cultures in statistical modeling. Because of Prof. Huber’s diverse experiences and many contributions in various fields, this conversation may entertain astronomers and computer scientists as well as statisticians.

The dialog is available through arxiv.org: [stat.ME:0808.0777] written by Andreas Buja, Hans R. Künsch.

He became famous due to his early year paper in robust statistics titled, Robust Estimation of a Location Parameter but I see him as a pioneer in data mining, laying a corner stone for massive/multivariate data analysis when computers were not as much capable as today’s. His book, Robust Statistics (Amazon link) and the paper Projection Pursuit in Annals of Statistics (Vol. 13, No. 2, pp. 435-475, yr. 1985) are popular among many well known publications.

He has publications in geoscience and Babylonian astronomy. This conversation includes names like Steven Weinberg, the novel laureate (The First Three Minutes is a well known general science book) and late Carl Sagan (famous for books/a movie like Cosmos and Contact) showing his extent scholarly interests and genius beyond statistics. At the beginning, I felt like learning the history of computation and data analysis apart from statistics.

Kepler and the Art of Astrophysical Inference

vlk — Wed, 16 Apr 2008 22:49:18 +0000

I recently discovered iTunesU, and I have to confess, I find it utterly fascinating. By golly, it is everything that they promised us that the internet would be. Informative, entertaining, and educational. What are the odds?!? Anyway, while poking around the myriad lectures, courses, and talks that are now online, I came across a popular Physics lecture series at UMichigan which listed a talk by one of my favorite speakers, Owen Gingerich. He had spoken about The Four Myths of the Copernican Revolution last November. It was, how shall we say, riveting.

Owen talks in detail about how the Copernican model came to supplant the Ptolemaic model. In particular, he describes how Kepler went from Ptolemaic epicycles to elliptical orbits. Contrary to general impression, Kepler did not fit ellipses to Tycho Brahe’s observations of Mars. The ellipticity is far too small for it to be fittable! But rather, he used logical reasoning to first offset Earth’s epicyle away from the center in order to avoid the so-called Martian Catastrophe, and then used the phenomenological constraint of the law of equal areas to infer that the path must be an ellipse.

This process, along with Galileo’s advocacy for the heliocentric system, demonstrates a telling fact about how Astrophysics is done in practice. Hyunsook once lamented that astronomers seem to be rather trigger happy with correlations and regressions, and everyone knows they don’t constitute proof of anything, so why do they do it? Owen says about 39 1/2 minutes into the lecture:

Here we have the fourth of the myths, that Galileo’s telescopic observations finally proved the motion of the earth and thereby, at last, established the truth of the Copernican system.

What I want to assure you is that, in general, science does not operate by proofs. You hear that an awful lot, about science looking for propositions that can be falsified, that proof plays this big role.. uh-uh. It is coherence of explanation, understanding things that are well-knit together; the broader the framework of knitting the things together, the more we are able to believe it.

Exactly! We build models, often with little justification in terms of experimental proof, and muddle along trying to make it fit into a coherent narrative. This is why statistics is looked upon with suspicion among astronomers, and why for centuries our mantra has been “if it takes statistics to prove it, it isn’t real!”