Archive for the ‘Methods’ Category.

Oct 8th, 2008| 01:31 am | Posted by hlee

In order to understand a learning procedure statistically it is necessary to identify two important aspects: **its structural model and its error model.** The former is most important since it determines the function space of the approximator, thereby characterizing the class of functions or hypothesis that can be accurately approximated with it. The error model specifies the distribution of random departures of sampled data from the structural model.

Continue reading ‘A Quote on Model’ »

Tags:

error model,

Friedman,

Hastie,

model,

structural model,

Tibshirani Category:

Astro,

Cross-Cultural,

Jargon,

Methods,

Quotes,

Stat |

1 Comment
Oct 1st, 2008| 04:16 pm | Posted by hlee

People of experience would say very differently and wisely against what I’m going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics. Continue reading ‘survey and design of experiments’ »

Tags:

213,

AAS,

Alanna Connors,

catalog,

census,

detection,

experimental design,

Long Beach,

special session,

SPS,

survey Category:

Astro,

CHASC,

Cross-Cultural,

Data Processing,

Jargon,

Methods,

Misc,

News,

Stat |

3 Comments
Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse **classification and clustering** and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a **black box**. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags:

black box,

book,

catalog,

Classification,

clustering,

haste,

outliers,

R,

Robert Serfling,

semi-supervised learning,

survey Category:

Algorithms,

arXiv,

Astro,

Bad AstroStat,

Cross-Cultural,

Data Processing,

Frequentist,

Jargon,

Methods,

Stat |

Comment
Sep 17th, 2008| 02:11 pm | Posted by hlee

I’ve been joking about the astronomers’ fashion in writing **Markov chain Monte Carlo (MCMC)**. Frequently, **MCMC** was represented by **Monte Carlo Markov Chain** in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Tags:

BUGS,

data augmentation,

EM,

Gibbs sampling,

Hasting,

history,

Metropolis,

reversible jump,

simulated annealing Category:

Algorithms,

arXiv,

Bad AstroStat,

Bayesian,

Cross-Cultural,

Data Processing,

Imaging,

MC,

MCMC,

Methods,

Quotes,

Stat |

2 Comments
Sep 16th, 2008| 04:34 pm | Posted by hlee

Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, **BUGS** stands for **Bayesian inference Using Gibbs Sampling.** Continue reading ‘BUGS’ »

Tags:

openBUGS,

PyBUGS,

Python,

R,

toolbox,

winBUGS Category:

Algorithms,

Bayesian,

Data Processing,

Languages,

MCMC,

Methods,

News |

Comment
Sep 10th, 2008| 10:46 pm | Posted by hlee

The following footnotes are from one of Prof. Babu’s slides but I do not recall which occasion he presented the content.

– In the XSPEC packages, the **parametric bootstrap** is command FAKEIT, which makes Monte Carlo simulation of specified spectral model.

– XSPEC does not provide a **nonparametric bootstrap** capability.

Continue reading ‘Parametric Bootstrap vs. Nonparametric Bootstrap’ »

Sep 10th, 2008| 10:15 am | Posted by hlee

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — **Henri Poincare**

Continue reading ‘Why Gaussianity?’ »

Tags:

CLT,

Gaussianity,

Henry Poincare,

IEEE,

normal,

signal processing,

signal processing magazine,

Why Category:

arXiv,

Cross-Cultural,

Data Processing,

Fitting,

Frequentist,

Methods,

Physics,

Quotes,

Stat,

Uncertainty |

Comment
Aug 28th, 2008| 08:44 pm | Posted by hlee

Talking about limits in Numerical Recipes in my PyIMSL post, I couldn’t resist checking materials, particularly updates in the new edition of Numerical Recipes by Press, et al. (2007). Continue reading ‘NR, the 3rd edition’ »

Jul 9th, 2008| 01:00 pm | Posted by vlk

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). **[**See also Hyunsook's primer.**]** It has been coded up and is available for use as part of the ASURV package. Continue reading ‘Kaplan-Meier Estimator (Equation of the Week)’ »

Tags:

censored,

EotW,

Equation,

Equation of the Week,

Feigelson,

Isobe,

Kaplan-Meier,

maximum likelihood,

Nelson,

Schmitt,

survival analysis,

upper limit Category:

Frequentist,

Jargon,

Methods,

Stat |

13 Comments
Jul 1st, 2008| 10:10 pm | Posted by hlee

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution. Continue reading ‘A test for global maximum’ »

Jun 8th, 2008| 09:45 pm | Posted by hlee

Despite no statistic related discussion, a paper comparing XSPEC and ISIS, spectral analysis open source applications might bring high energy astrophysicists’ interests this week. Continue reading ‘[ArXiv] 1st week, June 2008’ »

Tags:

black box,

catalog,

CMB,

confidence interval,

EGRET,

ICA,

ISIS,

maximum likelihood,

radio,

sample size,

student t,

XSPEC Category:

arXiv,

Data Processing,

gamma-ray,

High-Energy,

Methods,

Stat |

Comment
Jun 3rd, 2008| 02:53 am | Posted by vlk

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — *how does one figure what the error bounds are on the “best-fit” Lowess curve?* Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

Tags:

Brad Wargelin,

error bands,

error bars,

Fitting,

least-squares,

Loess,

Lowess,

polynomial,

question for statisticians,

smoothing Category:

Algorithms,

Fitting,

Methods,

Stat,

Uncertainty |

11 Comments
May 26th, 2008| 02:59 pm | Posted by hlee

Tags:

clustering,

high dimension,

LF,

maximum likelihood,

multivariate,

Poisson,

Schechter,

zero count Category:

arXiv,

Bayesian,

Fitting,

MCMC,

Methods,

Stat |

Comment
Apr 29th, 2008| 02:24 am | Posted by hlee

Scheming arXiv:astro-ph abstracts almost an year never offered me an occasion that the fit of the Poisson distribution is tested in different ways, instead it is taken for granted by plugging data and (source) model into a (modified) χ^{2} function. If any doubts on the Poisson distribution occur, the following paper might be useful: Continue reading ‘tests of fit for the Poisson distribution’ »

Apr 21st, 2008| 11:56 pm | Posted by hlee

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh

**Estimating Third-Order Moments for an Absorber Catalog**

Continue reading ‘[ArXiv] Ripley’s K-function’ »