The AstroStat Slog

Archive for the ‘Frequentist’ Category.

Quintessential Contributions

Sep 26th, 2008| 11:49 pm | Posted by hlee

To my personal thoughts, the history of astronomy is more interesting than the history of statistics. This may change tomorrow. Harvard statistics department (chair Xiao-Li Meng) organizes a symposium titled

Quintessential Contributions:
Celebrating Major Birthdays of Statistical Ideas and Their Inventors
When: Saturday, September 27, 2008, 9:45 AM – 5:00 PM
Where: Radcliffe Gymnasium, 18 Mason Street, Cambridge, MA

Continue reading ‘Quintessential Contributions’ »

Tags: Gosset, Harvard, history, S.M.Stigler, student t, symposium
Category: Bayesian, Cross-Cultural, Frequentist, News, Quotes, Stat | 1 Comment

Classification and Clustering

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags: black box, book, catalog, Classification, clustering, haste, outliers, R, Robert Serfling, semi-supervised learning, survey
Category: Algorithms, arXiv, Astro, Bad AstroStat, Cross-Cultural, Data Processing, Frequentist, Jargon, Methods, Stat | Comment

Parametric Bootstrap vs. Nonparametric Bootstrap

Sep 10th, 2008| 10:46 pm | Posted by hlee

The following footnotes are from one of Prof. Babu’s slides but I do not recall which occasion he presented the content.

– In the XSPEC packages, the parametric bootstrap is command FAKEIT, which makes Monte Carlo simulation of specified spectral model.
– XSPEC does not provide a nonparametric bootstrap capability.

Continue reading ‘Parametric Bootstrap vs. Nonparametric Bootstrap’ »

Tags: bias correction, bootstrap, FAKEIT, Prof. Babu, XSPEC
Category: Fitting, Frequentist, Methods, Quotes, Spectral | Comment

Why Gaussianity?

Sep 10th, 2008| 10:15 am | Posted by hlee

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

Tags: CLT, Gaussianity, Henry Poincare, IEEE, normal, signal processing, signal processing magazine, Why
Category: arXiv, Cross-Cultural, Data Processing, Fitting, Frequentist, Methods, Physics, Quotes, Stat, Uncertainty | Comment

Kaplan-Meier Estimator (Equation of the Week)

Jul 9th, 2008| 01:00 pm | Posted by vlk

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package. Continue reading ‘Kaplan-Meier Estimator (Equation of the Week)’ »

Tags: censored, EotW, Equation, Equation of the Week, Feigelson, Isobe, Kaplan-Meier, maximum likelihood, Nelson, Schmitt, survival analysis, upper limit
Category: Frequentist, Jargon, Methods, Stat | 13 Comments

A test for global maximum

Jul 1st, 2008| 10:10 pm | Posted by hlee

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution. Continue reading ‘A test for global maximum’ »

Tags: Fisher information, global maximum, likelihood, regularity conditions, score function, test
Category: Fitting, Frequentist, Methods, Stat | 2 Comments

[ArXiv] 2nd week, May 2008

May 19th, 2008| 10:42 am | Posted by hlee

There’s no particular opening remark this week. Only I have profound curiosity about jackknife tests in [astro-ph:0805.1994]. Including this paper, a few deserve separate discussions from a statistical point of view that shall be posted. Continue reading ‘[ArXiv] 2nd week, May 2008’ »

Tags: bimodality, bootstrap, calibration uncertainty, CF, Classification, CMB, dip, exoplanet, Fisher matrix, flare, GL, jackknife, KS test, marked point, maximum likelihood, MLE, poisson point process, spatial data, XLF
Category: arXiv, Frequentist, Uncertainty, X-ray | Comment

The LRT is worthless for …

Apr 25th, 2008| 01:48 am | Posted by hlee

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

Tags: LRT, mixture
Category: arXiv, Bad AstroStat, CHASC, Frequentist, Stat | Comment

[ArXiv] use of the median

Apr 8th, 2008| 07:49 pm | Posted by hlee

The breakdown point of the mean is asymptotically zero whereas the breakdown point of the median is 1/2. The breakdown point is a measure of the robustness of the estimator and its value reaches up to 1/2. In the presence of outliers, the mean cannot be a good measure of the central location of the data distribution whereas the median is likely to locate the center. Common plug-in estimators like mean and root mean square error may not provide best fits and uncertainties because of this zero breakdown point of the mean. The efficiency of the mean estimator does not guarantee its unbiasedness; therefore, a bit of care is needed prior to plugging in the data into these estimators to get the best fit and uncertainty. There was a preprint from [arXiv] about the use of median last week. Continue reading ‘[ArXiv] use of the median’ »

Tags: breakdown point, mean, median, quantile
Category: arXiv, Frequentist, Stat, Uncertainty | Comment

Statistics is the study of uncertainty

Mar 30th, 2008| 11:16 pm | Posted by hlee

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from Continue reading ‘Statistics is the study of uncertainty’ »

Tags: client, Lindley, modeling, probability
Category: arXiv, Bayesian, Frequentist, Quotes, Uncertainty | 1 Comment

[ArXiv] A fast Bayesian object detection

Mar 5th, 2008| 04:46 pm | Posted by hlee

This is a quite long paper that I separated from [Arvix] 4th week, Feb. 2008:
[astro-ph:0802.3916] P. Carvalho, G. Rocha, & M.P.Hobso
A fast Bayesian approach to discrete object detection in astronomical datasets – PowellSnakes I
As the title suggests, it describes Bayesian source detection and provides me a chance to learn the foundation of source detection in astronomy. Continue reading ‘[ArXiv] A fast Bayesian object detection’ »

Tags: Bayesian evidence, coloured background, CRLB, decision theory, filter, Fisher informatoin, likelihood, PowellSnake, prior, simulated annealing, SNR, source detection, state space, Sunyaev-Zel'dovich effect, symmetric loss, templates
Category: Algorithms, arXiv, Bayesian, Cross-Cultural, Data Processing, Fitting, Frequentist, MCMC, Methods, Objects | Comment

Non-nested hypothesis tests

Feb 19th, 2008| 10:15 pm | Posted by hlee

I was reading [1]. I must say that I do not know Bayesian methods to cope with model misspecification, tests with an unknown true model, or tests for non-nested hypotheses except Bayes factor (concerns a lot how to choose priors). Nonetheless, the zeal among economists to test non-nested models might assist astronomers to move forward beyond testing nested hypotheses with F statistic. Continue reading ‘Non-nested hypothesis tests’ »

Tags: hypothesis testing, maximum likelihood, model misspecification, non-nested models, unknown truth
Category: Frequentist, Methods, Stat | Comment

Signal Processing and Bootstrap

Jan 30th, 2008| 02:33 am | Posted by hlee

Astronomers have developed their ways of processing signals almost independent to but sometimes collaboratively with engineers, although the fundamental of signal processing is same: extracting information. Doubtlessly, these two parallel roads of astronomers’ and engineers’ have been pointing opposite directions: one toward the sky and the other to the earth. Nevertheless, without an intensive argument, we could say that somewhat statistics has played the medium of signal processing for both scientists and engineers. This particular issue of IEEE signal processing magazine may shed lights for astronomers interested in signal processing and statistics outside the astronomical society.

IEEE Signal Processing Magazine Jul. 2007 Vol 24 Issue 4: Bootstrap methods in signal processing

This link will show the table of contents and provide links to articles; however, the access to papers requires IEEE Xplore subscription via libraries or individual IEEE memberships). Here, I’d like to attempt to introduce some articles and tutorials.
Continue reading ‘Signal Processing and Bootstrap’ »

Tags: bootstrap, compressive sensing, confidence interval, GLM, IEEE, jacknife, machine learning, multitaper estimate, particle filter, signal processing, statistical inference, Tutorial, wavelet
Category: Algorithms, arXiv, Bayesian, Cross-Cultural, Fitting, Frequentist, MC, MCMC, Methods, Misc, Spectral, Stat, Uncertainty | Comment

[ArXiv] An unbiased estimator, May 29, 2007

Oct 30th, 2007| 03:37 am | Posted by hlee

From arxiv/astro-ph:0705.4199v1
In search of an unbiased temperature estimator for statistically poor X-ray spectra
A. Leccardi and S. Molendi

There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay’s posting), I’d like to give a shot. I was very excited because I haven’t seen any astronomical papers discussing unbiased estimators solely.
Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »

Tags: chi-square, maximum likelihood, mixing distribution, mixture, nonparametric, robust, subsampling, transformation, unbiased, Uncertainty
Category: arXiv, Frequentist, Stat | Comment

Provocative Corollary to Andrew Gelman’s Folk Theorem

Oct 3rd, 2007| 04:08 pm | Posted by aconnors

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.

Category: Bayesian, Cross-Cultural, Data Processing, Frequentist, Imaging, Quotes, Stat, Uncertainty | 2 Comments