The AstroStat Slog

[ArXiv] Cross Validation

Aug 12th, 2009| 06:03 pm | Posted by hlee

Statistical Resampling Methods are rather unfamiliar among astronomers. Bootstrapping can be an exception but I felt like it’s still unrepresented. Seeing an recent review paper on cross validation from [arXiv] which describes basic notions in theoretical statistics, I couldn’t resist mentioning it here. Cross validation has been used in various statistical fields such as classification, density estimation, model selection, regression, to name a few.

[arXiv:math.ST:0907.4728]
A survey of cross validation procedures for model selection by Sylvain Arlot

Nonetheless, I’ll not review the paper itself except some quotes:

-CV is a popular strategy for model selection, and algorithm selection.
-Compared to the resubstitution error, CV avoids overfitting because the training sample is independent from the validation sample.
-A noticed in the early 30s by Larson (1931), training an algorithm and evaluating its statistical performance on the same data yields an overoptimistic results.

There are books on statistical resampling methods covering more general topics, not limited to model selection. Instead, I decide to do a little search how CV is used in astronomy. These are the ADS search results. More publications than I expected.

Kernel regression for determining photometric redshifts from Sloan broad-band photometry [arXiv:0706.2704]
Wang, D.; Zhang, Y. X.; Liu, C.; Zhao, Y. H.
Monthly Notices of the Royal Astronomical Society, Volume 382, Issue 4, pp. 1601-1606 (2007)
STECKMAP: STEllar Content and Kinematics from high resolution galactic spectra via Maximum A Posteriori [arXiv:0507002]
Ocvirk, P.; Pichon, C.; Lançon, A.; Thiébaut, E.
Monthly Notices of the Royal Astronomical Society, Volume 365, Issue 1, pp. 74-84 (2006)
STECMAP: STEllar Content from high-resolution galactic spectra via Maximum A Posteriori [arXiv:0505209]
Ocvirk, P.; Pichon, C.; Lançon, A.; Thiébaut, E.
Monthly Notices of the Royal Astronomical Society, Volume 365, Issue 1, pp. 46-73 (2006)
Automated Detection of Classical Novae with Neural Networks [arXiv:0604236]
Feeney, S. M et al.
The Astronomical Journal, Volume 130, Issue 1, pp. 84-94 (2005)
Estimation of regularization parameters in multiple-image deblurring [arxiv:0405545]
Vio, R.et al.
Astronomy and Astrophysics, v.423, p.1179-1186 (2004)
Machine learning and image analysis for morphological galaxy classification
de la Calleja, Jorge and Fuentes, Olac
Monthly Notices of the Royal Astronomical Society, Volume 349, Issue 4, pp. 87-93 (2004)
Ensembles of Classifiers for Morphological Galaxy Classification
Bazell, D.; Aha, David W.
The Astrophysical Journal, Volume 548, Issue 1, pp. 219-223.(2001)
Bayesian image reconstruction with space-variant noise suppression
Nunez, J.; Llacer, J.
Astronomy and Astrophysics Supplement, v.131, p.167-180 (1998)
Estimating the sun’s rotation from solar oscillations by regularisation
Thompson, A. M.
Astronomy and Astrophysics (ISSN 0004-6361), vol. 265, no. 1, p. 289-295. (1992)

One can easily grasp that many adopted CV under the machine learning context. The application of CV, and bootstrapping is not limited to machine learning. As Arlot’s title, CV is used for model selection. When it come to model selection in high energy astrophysics, not CV but reduced chi^2 measures and fitted curve eye balling are the standard procedure. Hopefully, a renovated model selection procedure via CV or other statistically robust strategy soon challenge the reduced chi^2 and eye balling. On the other hand, I doubt that it’ll come soon. Remember, eyes are the best classifier so it won’t be a easy task.

Tags: ADS, cross-validation, machine learning, Model Selection, n-fold
Category: arXiv, Astro, Bayesian, Cross-Cultural, Data Processing, Frequentist, Jargon, Methods, Quotes, Stat | Comment (RSS) | Trackback

[ArXiv] Cross Validation

Leave a comment

Admin

Recent Posts

Recent Comments

Category Cloud

Blogroll

Links