The AstroStat Slog » error

Guinness, Gosset, Fisher, and Small Samples

hlee — Thu, 12 Feb 2009 18:03:01 +0000

Student’s t-distribution is somewhat underrepresented in the astronomical community. Having an article with nice stories, it looks to me the best way to introduce the t distribution. This article describing historic anecdotes about monumental statistical developments occurred about 100 years ago.

Guinness, Gosset, Fisher, and Small Samples by Joan Fisher Box
Source: Statist. Sci. Volume 2, Number 1 (1987), 45-52.

No time for reading the whole article? I hope you have a few minutes to read following quotes, which are quite enchanting to me.

[p.45] One of the first things you learn in statistics is to distinguish between the true parameter value of the standard deviation σ and the sample standard deviation s. But at the turn of the century statisticians did not. They called both σ and s the standard deviation. They always used such large samples that their estimate really did approximate the parameter value, so it did not make much difference to their results. But their methods would not do for experimental work. You cannot get samples of thousands of experimental points. …

[p.49] …, the main question was exactly how much wider should the error limits be to make allowance for the error introduced by using the estimates m and s instead of the parameters μ and σ. Pearson could not answer that question for Gosset in 1905, nor the one that followed, which was: what level of probability should be called significant?

[p.49] …, Gosset worked out the exact answer to his question about the probable error of the mean and tabulated the probability values of his criterion z=(m-μ)/s for samples of N=2,3,…,10. He tried also to calculate the distribution of the correlation coefficient by the same method but managed to get the answer only for the case when the true correlation is zero. …

Lost in Translation: Measurement Error

vlk — Sat, 03 Jan 2009 03:24:32 +0000

You would think that something like “measurement error” is a well-defined concept, and everyone knows what it means. Not so. I have so far counted at least 3 different interpretations of what it means.

Suppose you have measurements X={X_i, i=1..N} of a quantity whose true value is, say, X₀. One can then compute the mean and standard deviation of the measurements, E(X) and σ_X. One can also infer the value of a parameter θ(X), derive the posterior probability density p(θ|X), and obtain confidence intervals on it.

So here are the different interpretations:

Measurement error is σ_X, or the spread in the measurements. Astronomers tend to use the term in this manner.
Measurement error is X₀-E(X), or the “error made when you make the measurement”, essentially what is left over beyond mere statistical variations. This is how statisticians seem to use it, essentially the bias term. To quote David van Dyk

For us it is just English. If your measurement is different from the real value. So this is not the Poisson variability of the source for effects or ARF, RMF, etc. It would disappear if you had a perfect measuring device (e.g., telescope).
Measurement error is the width of p(θ|X), i.e., the measurement error of the first type propagated through the analysis. Astronomers use this too to refer to measurement error.

Who am I to say which is right? But be aware of who you may be speaking with and be sure to clarify what you mean when you use the term!

loess and lowess and locfit, oh my

chasc — Fri, 25 Jul 2008 17:12:42 +0000

Diab Jerius follows up on LOESS techniques with a very nice summary update and finds LOCFIT to be very useful, but there are still questions about how it deals with measurement errors and combining observations from different experiments:

A couple of weeks ago Vinay suggested using the LOESS algorithm to create smooth curves (separately) through the SSD and FPC points. LOESS has been succeeded by LOWESS and, finally LOCFIT, which is the 800lb gorilla of local regression fitting.

The LOCFIT algorithm uses local regression (i.e. fits over samples of the data) to generate smooth curves. There is an enormous body of literature on this, much of it summarized in the book

Local Regression and Likelikhood, by C. Loader
ISBN 0-387-98775-4

which also serves as documentation for the LOCFIT software. The techniques seem well established and accepted by the statistical community.

LOCFIT looks to be a very elegant approach, but, unfortunately, I have still not been able to glean any information as to how one introduces experimental errors into the regressions. The voluminous research in this field certainly deals with experimental data, so I’m not quite sure what to make of this.

One way around this might be to take a Monte-Carlo approach: resample the data using the experimental errors, generate a new smoothing function, and generate a measure of the distribution of the fit functions.

For those interested, I have a copy of the above book on loan.
It’s fascinating reading.

More about the actual code is available at this web site:
http://locfit.herine.net/

In addition, Ping Zhao asks: (paraphrasing) if you combine two separate sets of observations with vastly different numbers of data points in each, how do you weight them during a combined loess/lowess/locfit fit?

Comments and suggestions from statisticians are much appreciated!