The AstroStat Slog » MCMC

[Books] Bayesian Computations

hlee — Fri, 11 Sep 2009 20:40:23 +0000

A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories.

Bayesian Computation with R
Author:Jim Albert
Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in Robust Statistics, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Author: J. Marin and C.P.Robert
Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of cosmic, the book used cosmological. I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the Metropolis-within-Gibbs algorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, detailed balance equation and random walk.

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the semiparametric perspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributions f_j does make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical Methods Robert, C. and Casella, G. (2004)
Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be Bayesian Computation with Python, IDL, Matlab, Java, or C/C++ for those who never intend to use R. By the way, for Mathematica users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

Robust Statistics

hlee — Mon, 18 May 2009 17:18:09 +0000

My understandings of “robustness” from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss?

First, since it’s short, let’s quote a comment from an astronomer that might reflect the notion of robust statistics in astronomy.

Bayesian is robust.

Is every Bayesian method robust and its counter part from classical statistics is not robust? I know that popular statistics in astronomy are not, generally speaking, robust and those popular statistics were borne before the notion of robustness in statistics were recognized and discussed.

I do understand why such partisan comment was produced. Astronomers always reports their data analysis results by best fits, error bars, probability, or some significance levels (they don’t say explicitly, p-values, powers, type I or type II errors, unbiased estimates, and other statistical jargon in inference problems) and those classical methods of frequent use have caused frustrations due to their lack of robustness. On the contrary, MCMC algorithms for estimating posterior distributions produce easy interpretable results to report best fit (mode) and error bar (HPD).

My understanding of robustness as a statistician does not draw a line between Bayesian and frequenstists. The following is quoted from the Katholieke Universiteit Leuven website of which mathematics department has a focus group for robust statistics.

Robust statistical methods and applications.
The goal of robust statistics is to develop data analytical methods which are resistant to outlying observations in the data, and hence which are also able to detect these outliers. Pioneering work in this area has been done by Huber (1981), Hampel et al. (1986) and Rousseeuw and Leroy (1987). In their work, estimators for location, scale, scatter and regression play a central role. They assume that the majority of the data follow a parametric model, whereas a minority (the contamination) can take arbitrary values. This approach leads to the concept of the influence function of an estimator which measures the influence of a small amount of contamination in one point. Other measures of robustness are the finite-sample and the asymptotic breakdown value of an estimator. They tell what the smallest amount of contamination is which can carry the estimates beyond all bounds.

Nowadays, robust estimators are being developed for many statistical models. Our research group is very active in investigating estimators of covariance and regression for high-dimensional data, with applications in chemometrics and bio-informatics. Recently, robust estimators have been developed for PCA (principal component analysis), PCR (principal component regression), PLS (partial least squares), classification, ICA (independent component analysis) and multi-way analysis. Also robust measures of skewness and tail weight have been introduced. We study robustness of kernel methods, and regression quantiles for censored data.

My understanding of “robustness” from statistics education is pandemic, covers both Bayesian and frequentist. Any methods and models that are insensitive or immune to outliers, are robust methods and statistics. For example, median is more robust than mean since the breakpoint of median is 1/2 and that of mean is 0, asymptotically. Both mean and median are estimable from Bayesian and frequentist methods. Instead of standard deviation, tactics like lower and upper quartiles to indicate error bars or Winsorization (or trim) to obtain a standard deviation for the error bar, are adopted regardless of Bayesian or frequenstist. Instead of the chi square goodness-of-fit tests, which assume Gaussian residuals, nonparametrics tests or distribution free tests can be utilized.

The notion that frequentist methods are not robust might have been developed from the frustration that those chi-square related methods in astronomy do not show robust characteristics. The reason is that data are prone to the breaks of the normality (Gaussianity) assumption. Also, the limited library of nonparametric methods in data analysis packages and softwares envisions that frequentist methods are not robust. An additional unattractive aspect about frequentist methods is that the description seems too mathematical, too abstract, and too difficult to be codified with full of unfriendly jargon whereas the Bayesian methods provide step by step modeling procedures with explanation why they chose such likelihood and such priors based on external knowledge from physics and observations (MCMC algorithms in the astronomical papers are adaptation of already proven algorithms from statistics and algorithm journals).

John Tukey said:

Robustness refers to the property of a procedure remaining effective even in the absence of usual assumptions such as normality and no incorrect data values. In simplest terms the idea is to improve upon the use of the simple arithmetic average in estimating the center of a distribution. As a simple case one can ask: Is it ever better to use the sample median than the samle mean, and if so, when?

I don’t think the whole set of frequentist methods is the complement set of Bayesians. Personally I feel quite embarrassed whenever I am told that frequentist methods are not robust compared to Bayesian methods. Bayesian methods become robust when a priori knowledge (subjective priors) allows the results to be unaffected by outliers with a general class of likelihood. Regardless of being frequentist or Bayesian, statistics have been developed to be less sensitive to outliers and to do optimal inferences, i.e. to achieve the goal, robustness.

Ah…there are other various accounts for robustness methods/statistics in astronomy not limited to “bayesian is robust.” As often I got confused whenever I see statistically rigorous while the method is a simple chi-square minimization in literature, I believe astronomers can educate me the notion of robustness and the definition of robust statistics in various ways. I hope for some sequels from astronomers about robustness and robust statistics to this post of limited view.

A History of Markov Chain Monte Carlo

hlee — Wed, 17 Sep 2008 18:11:01 +0000

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv:

[stat.CO:0808.2902] A History of Markov Chain Monte Carlo–Subjective Recollections from Incomplete Data– by C. Robert and G. Casella
Abstract: In this note we attempt to trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940′s through its use today. We see how the earlier stages of the Monte Carlo (MC, not MCMC) research have led to the algorithms currently in use. More importantly, we see how the development of this methodology has not only changed our solutions to problems, but has changed the way we think about problems.

Here is the year list of monumental advances in the MCMC history,

1946: ENIAC
late 1940′s: inception along with Monte Carlo methods.
1953: Metropolis algorithm published in Journal of Chemical Physics (Metropolis et al.)
1970: Hastings algorithms in Biometrika (Hastrings)
1974: Gibbs sampler and Hammersley-Clifford theorem paper by Besag and its discussion by Hammersley in JRSSS B
1977: EM algorithm in JRSSS B (Dempster et al)
1983: Simulated Annealing algorithm (Kirkpatrick et al.)
1984: Gibbs sampling in IEEE Trans. Pattern Anal. Mach. Intell. (Geman and Geman, this paper is responsible for the name)
1987: data augmentation in JASA (Tanner and Wong)
1980s: image analysis and spatial statistics enjoyed MCMC algorithms, not popular with others due to the lack of computing power
1990: seminal paper by Gelfand and Smith in JSAS
1991: BUGS was presented at the Valencia meeting
1992: introductory paper by Casella and Georgy
1994: influential MCMC theory paper by Tierney in Ann. Stat.
1995: reversible jump algorithm in Biometrika (Green)
mid 1990′s: boom of MCMC due to particle filters, reversible jump and perfect sampling (second-generation of MCMC revolution)

and a nice quote from conclusion.

MCMC changed out emphasis from “closed form” solutions to algorithms, expanded our immpact to solving “real” applied problems, expanded our impact to improving numerical algorithms using statistical ideas, and led us into a world where “exact” now means “simulated”!

If you consider applying MCMC methods in your data analysis, references listed in Robert and Casella serve as a good starting point.

BUGS

hlee — Tue, 16 Sep 2008 20:34:23 +0000

Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, BUGS stands for Bayesian inference Using Gibbs Sampling.

Disclaimer: I never did serious Bayesian computations so that information I provide here tends to be very shallow. Both statisticians and astronomers oriented by Bayesian ideals are very welcome to add advanced pieces of information.

Bayesian statistics is very much preferred in astronomy, at least here at Harvard Smithsonian Center for Astrophysics. Yet, I do not understand why astronomy data analysis packages do not include libraries, modules, or toolboxes for MCMC (porting scripts from Numerical Recipes or IMSL, or using Python does not count here since these are also used by engineers and scientists of other disciplines: my view is also restricted thanks to my limited experience in using astronomical data analysis packages like ciao, XSPEC, IDL, IRAF, and AIPS) similar to WinBUGS or OpenBUGS. Most of Bayesian analysis in astronomy has to be done from the scratch, which drives off simple minded people like me (I prefer analytic forms and estimators than posterior chains). I hope easily implementable Bayesian Data Analysis modules come along soon to current astronomical data analysis systems for any astronomers who only had a lecture about Bayes theorem and Gibbs sampling. Perhaps, BUGS can be a role model to develop such modules.

As listed, one does not need R to use BUGS. WinBUGS is both stand alone and R implementable. PyBUGS can be handy since python is popular among astronomers. I heard that MATLAB (its open source counterpart, OCTAVE) has its own tools to maneuver Bayesian Data Analysis relatively easily. There are many small MCMC modules to solve particular problems in astronomy but none of them are reported to be robust enough so as to be applied in other type data sets. Not many have the freedom of choosing models and priors.

Hopefully, well knowledged Bayesians contribute in developing modules for Bayesian data analysis in astronomy. I don’t like to see contour plots, obtained from brute-forceful and blinded χ² fitting, claimed to be bivariate probability density profiles. I’d like to project the module development like the way that BUGS is developed in astronomical data analysis packages with various Bayesian libraries. Here are some web links about BUGS:
The BUGS Project
WinBUGS
OpenBUGS
Calling WinBUGS 1.4 from other programs

[ArXiv] 3rd week, June 2008

hlee — Sun, 22 Jun 2008 03:10:21 +0000

This is my last [ArXiv] series.

[astro-ph:0806.2228] T.J. Cornwell
Multi-Scale CLEAN deconvolution of radio synthesis images
[astro-ph:0806.2575] C.A.L. Bailer-Jones
What will Gaia tell us about the Galactic disk?
[astro-ph:0806.2823] Williams et al.
Lensed Image Angles: New Statistical Evidence for Substructure (Apart from their K-S tests, personally lensing is considered to be a nice subject from a geostatistics standpoint.)
[astro-ph:0806.3074] Eriksen and Wehus
Marginal distributions for cosmic variance limited CMB polarization data
[astro-ph:0806.2969] W. Boschin et al.
Optical analysis of the poor clusters Abell 610, Abell 725, and Abell 796, containing diffuse radio sources (astronomers call gaussian mixture models by KMM)
[astro-ph:0806.3096] Miller, Shimon, & Keating
CMB Beam Systematics: Impact on Lensing Parameter Estimation (Note Monte Carlo Markov Chain in the abstract, not Markov chain Monte Carlo.)

[ArXiv] 2nd week, June 2008

hlee — Mon, 16 Jun 2008 14:47:42 +0000

As Prof. Speed said, PCA is prevalent in astronomy, particularly this week. Furthermore, a paper explicitly discusses R, a popular statistics package.

[astro-ph:0806.1140] N.Bonhomme, H.M.Courtois, R.B.Tully
Derivation of Distances with the Tully-Fisher Relation: The Antlia Cluster
(Tully Fisher relation is well known and one of many occasions statistics could help. On the contrary, astronomical biases as well as measurement errors hinder from the collaboration).
[astro-ph:0806.1222] S. Dye
Star formation histories from multi-band photometry: A new approach (Bayesian evidence)
[astro-ph:0806.1232] M. Cara and M. Lister
Avoiding spurious breaks in binned luminosity functions
(I think that binning is not always necessary and overdosed, while there are alternatives.)
[astro-ph:0806.1326] J.C. Ramirez Velez, A. Lopez Ariste and M. Semel
Strength distribution of solar magnetic fields in photospheric quiet Sun regions (PCA was utilized)
[astro-ph:0806.1487] M.D.Schneider et al.
Simulations and cosmological inference: A statistical model for power spectra means and covariances
(They used R and its package Latin hypercube samples, lhs.)
[astro-ph:0806.1558] Ivan L. Andronov et al.
Idling Magnetic White Dwarf in the Synchronizing Polar BY Cam. The Noah-2 Project (PCA is applied)
[astro-ph:0806.1880] R. G. Arendt et al.
Comparison of 3.6 – 8.0 Micron Spitzer/IRAC Galactic Center Survey Point Sources with Chandra X-Ray Point Sources in the Central 40×40 Parsecs (K-S test)

[ArXiv] 4th week, May 2008

hlee — Sun, 01 Jun 2008 03:59:15 +0000

Eight astro-ph papers and two statistics paper are listed this week. One statistics paper discusses detecting filaments and the other talks about maximum likelihood estimation of satellite images (clouds).

[astro-ph:0805.3532] Balan and Lahav
ExoFit: Orbital Parameters of Extra-solar Planets from Radial Velocities (MCMC)
[astro-ph:0805.3983] R. G. Carlberg et al.
Clustering of supernova Ia host galaxies (Jackknife method is utilized).
[astro-ph:0805.4005] Kurek, Hrycyna, & Szydlowski
From model dynamics to oscillating dark energy parametrisation (Bayes factor)
[astro-ph:0805.4136] C. Genovese et al.
Inference for the Dark Energy Equation of State Using Type Ia Supernova Data
[math.ST:0805.4141] C. Genovese et al.
On the path density of a gradient field (detecting filaments via kernel density estimation, KDE)
[astro-ph:0805.4342] C. Espaillat et al.
Wavelet Analysis of AGN X-Ray Time Series: A QPO in 3C 273?
[astro-ph:0805.4414] Tegmark and Zaldarriaga
The Fast Fourier Transform Telescope
[astro-ph:0805.4417] A. Georgakakis et al.
A new method for determining the sensitivity of X-ray imaging observations and the X-ray number counts
[stat.AP:0805.4598] E. Anderes et al.
Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery

[ArXiv] 3rd week, May 2008

hlee — Mon, 26 May 2008 18:59:38 +0000

Not many this week, but there’s a great read.

[stat.ME:0805.2756] Fionn Murtagh
The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
[astro-ph:0805.2945] Martin, de Jong, & Rix
A comprehensive Maximum Likelihood analysis of the structural properties of faint Milky Way satellites
[astro-ph:0805.2946] Kelly, Fan, & Vestergaard
A Flexible Method of Estimating Luminosity Functions [my subjective comment is added at the bottom]
[stat.ME:0805.3220] Bayarri, Berger, Datta
Objective Bayes testing of Poisson versus inflated Poisson models (will it be of use when one is dealing with many zero background counts, underpopulated above zero background counts, and underpopulated source counts?)

[Comment] You must read it. It can serve as a very good Bayesian tutorial for astronomers. I think there’s a typo, nothing major, plus/minus sign in the likelihood, though. Tom Loredo kindly has informed through his extensive slog comments about Schechter function and this paper made me appreciate the gamma distribution more. Schechter function and the gamma density function share the same equation although the objective of their use does not have much to be shared (Forgive my Bayesian ignorance in the extensive usage of gamma distribution except the fact it’s a conjugate of Poisson or exponential distribution).

FYI, there was another recent arxiv paper on zero-inflation [stat.ME:0805.2258] by Bhattacharya, Clarke, & Datta
A Bayesian test for excess zeros in a zero-inflated power series distribution

[ArXiv] 1st week, May 2008

hlee — Mon, 12 May 2008 02:42:54 +0000

I think I have to review spatial statistics in astronomy, focusing on tessellation (void structure), point process (expanding 2 (3) point correlation function), and marked point process (spatial distribution of hardness ratios of X-ray distant sources, different types of galaxies -not only morphological differences but other marks such as absolute magnitudes and existence of particular features). When? Someday…

In addition to Bayesian methodologies, like this week’s astro-ph, studies on characterizing empirical spatial distributions of voids and galaxies frequently appear, which I believe can be enriched further with the ideas from stochastic geometry and spatial statistics. Click for what was appeared in arXiv this week.

[astro-ph:0805.0156]R. D’Abrusco, G. Longo, N. A. Walton
Quasar candidates selection in the Virtual Observatory era
[astro-ph:0805.0201] S. Vegetti& L.V.E. Koopmans
Bayesian Strong Gravitational-Lens Modelling on Adaptive Grids: Objective Detection of Mass Substructure in Galaxies (many like to see this paper: nest sampling implemented, discusses penalty function and tessllation)
[astro-ph:0805.0238] J. A. Carter et al.
Analytic Approximations for Transit Light Curve Observables, Uncertainties, and Covariances
[astro-ph:0805.0269] S.M.Leach et al.
Component separation methods for the Planck mission
[astro-ph:0805.0276] M. Grossi et al.
The mass density field in simulated non-Gaussian scenarios
[astro-ph:0805.0790] Ceccarelli, Padilla, & Lambas
Large-scale modulation of star formation in void walls
[astro-ph:0805.0797] Ceccarelli et al.
Voids in the 2dFGRS and LCDM simulations: spatial and dynamical properties
[astro-ph:0805.0875] S. Basilakos and L. Perivolaropoulos
Testing GRBs as Standard Candles
[astro-ph:0805.0968] A. A. Stanislavsky et al.
Statistical Modeling of Solar Flare Activity from Empirical Time Series of Soft X-ray Solar Emission

[ArXiv] 5th week, Apr. 2008

hlee — Mon, 05 May 2008 07:08:42 +0000

Since I learned Hubble’s tuning fork^[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was.

I’m glad to see this week presented a paper that I had dreamed of many years ago in addition to other interesting papers. Nowadays, I’m more and more realizing that astronomical machine learning is not simple as what we see from machine learning and statistical computation literature, which typically adopted data sets from the data repository whose characteristics are well known over the many years (for example, the famous iris data; there are toy data sets and mock catalogs, no shortage of data sets of public characteristics). As the long list of authors indicates, machine learning on astronomical massive data sets are never meant to be a little girl’s dream. With a bit of my sentiment, I offer the list of this week:

[astro-ph:0804.4068] S. Pires et al.
FASTLens (FAst STatistics for weak Lensing) : Fast method for Weak Lensing Statistics and map making
[astro-ph:0804.4142] M.Kowalski et al.
Improved Cosmological Constraints from New, Old and Combined Supernova Datasets
[astro-ph:0804.4219] M. Bazarghan and R. Gupta
Automated Classification of Sloan Digital Sky Survey (SDSS) Stellar Spectra using Artificial Neural Networks
[gr-qc:0804.4144]E. L. Robinson, J. D. Romano, A. Vecchio
Search for a stochastic gravitational-wave signal in the second round of the Mock LISA Data challenges
[astro-ph:0804.4483]C. Lintott et al.
Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
[astro-ph:0804.4692] M. J. Martinez Gonzalez et al.
PCA detection and denoising of Zeeman signatures in stellar polarised spectra
[astro-ph:0805.0101] J. Ireland et al.
Multiresolution analysis of active region magnetic structure and its correlation with the Mt. Wilson classification and flaring activity

A relevant post related machine learning on galaxy morphology from the slog is found at svm and galaxy morphological classification

< Added: 3rd week May 2008>[astro-ph:0805.2612] S. P. Bamford et al.
Galaxy Zoo: the independence of morphology and colour

Wikipedia link: Hubble sequence