Archive for the ‘Methods’ Category.

A Quote on Model

In order to understand a learning procedure statistically it is necessary to identify two important aspects: its structural model and its error model. The former is most important since it determines the function space of the approximator, thereby characterizing the class of functions or hypothesis that can be accurately approximated with it. The error model specifies the distribution of random departures of sampled data from the structural model.

Continue reading ‘A Quote on Model’ »

survey and design of experiments

People of experience would say very differently and wisely against what I’m going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics. Continue reading ‘survey and design of experiments’ »

Classification and Clustering

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

A History of Markov Chain Monte Carlo

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »


Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, BUGS stands for Bayesian inference Using Gibbs Sampling. Continue reading ‘BUGS’ »

Parametric Bootstrap vs. Nonparametric Bootstrap

The following footnotes are from one of Prof. Babu’s slides but I do not recall which occasion he presented the content.

– In the XSPEC packages, the parametric bootstrap is command FAKEIT, which makes Monte Carlo simulation of specified spectral model.
– XSPEC does not provide a nonparametric bootstrap capability.

Continue reading ‘Parametric Bootstrap vs. Nonparametric Bootstrap’ »

Why Gaussianity?

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

NR, the 3rd edition

Talking about limits in Numerical Recipes in my PyIMSL post, I couldn’t resist checking materials, particularly updates in the new edition of Numerical Recipes by Press, et al. (2007). Continue reading ‘NR, the 3rd edition’ »

Kaplan-Meier Estimator (Equation of the Week)

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package. Continue reading ‘Kaplan-Meier Estimator (Equation of the Week)’ »

A test for global maximum

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution. Continue reading ‘A test for global maximum’ »

[ArXiv] 1st week, June 2008

Despite no statistic related discussion, a paper comparing XSPEC and ISIS, spectral analysis open source applications might bring high energy astrophysicists’ interests this week. Continue reading ‘[ArXiv] 1st week, June 2008’ »

Q: Lowess error bars?

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — how does one figure what the error bounds are on the “best-fit” Lowess curve? Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

[ArXiv] 3rd week, May 2008

Not many this week, but there’s a great read. Continue reading ‘[ArXiv] 3rd week, May 2008’ »

tests of fit for the Poisson distribution

Scheming arXiv:astro-ph abstracts almost an year never offered me an occasion that the fit of the Poisson distribution is tested in different ways, instead it is taken for granted by plugging data and (source) model into a (modified) χ2 function. If any doubts on the Poisson distribution occur, the following paper might be useful: Continue reading ‘tests of fit for the Poisson distribution’ »

[ArXiv] Ripley’s K-function

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog

Continue reading ‘[ArXiv] Ripley’s K-function’ »