Bayesian Computation with R

Author:Jim Albert

Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in *Robust Statistics*, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.

Author: J. Marin and C.P.Robert

Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of __cosmic__, the book used __cosmological.__ I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the

Metropolis-within-Gibbsalgorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, *detailed balance equation* and *random walk.*

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the

semiparametricperspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributionsf_jdoes make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical MethodsRobert, C. and Casella, G. (2004)

Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be **Bayesian Computation with Python, IDL, Matlab, Java, or C/C++** for those who never intend to use R. By the way, for **Mathematica** users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

First, since it’s short, let’s quote a comment from an astronomer that might reflect the notion of robust statistics in astronomy.

Bayesian is robust.

Is every Bayesian method robust and its counter part from classical statistics is not robust? I know that popular statistics in astronomy are not, generally speaking, robust and those popular statistics were borne before the notion of robustness in statistics were recognized and discussed.

I do understand why such partisan comment was produced. Astronomers always reports their data analysis results by best fits, error bars, probability, or some significance levels (they don’t say explicitly, p-values, powers, type I or type II errors, unbiased estimates, and other statistical jargon in inference problems) and those classical methods of frequent use have caused frustrations due to their lack of robustness. On the contrary, MCMC algorithms for estimating posterior distributions produce easy interpretable results to report best fit (mode) and error bar (HPD).

My understanding of **robustness** as a statistician does not draw a line between Bayesian and frequenstists. The following is quoted from the *Katholieke Universiteit Leuven* website of which mathematics department has a focus group for **robust statistics**.

Robust statistical methods and applications.

The goal of robust statistics is to develop data analytical methods which are resistant to outlying observations in the data, and hence which are also able to detect these outliers. Pioneering work in this area has been done by Huber (1981), Hampel et al. (1986) and Rousseeuw and Leroy (1987). In their work, estimators for location, scale, scatter and regression play a central role. They assume that the majority of the data follow a parametric model, whereas a minority (the contamination) can take arbitrary values. This approach leads to the concept of the influence function of an estimator which measures the influence of a small amount of contamination in one point. Other measures of robustness are the finite-sample and the asymptotic breakdown value of an estimator. They tell what the smallest amount of contamination is which can carry the estimates beyond all bounds.Nowadays, robust estimators are being developed for many statistical models. Our research group is very active in investigating estimators of covariance and regression for high-dimensional data, with applications in chemometrics and bio-informatics. Recently, robust estimators have been developed for PCA (principal component analysis), PCR (principal component regression), PLS (partial least squares), classification, ICA (independent component analysis) and multi-way analysis. Also robust measures of skewness and tail weight have been introduced. We study robustness of kernel methods, and regression quantiles for censored data.

My understanding of “robustness” from statistics education is pandemic, covers both Bayesian and frequentist. Any methods and models that are insensitive or immune to outliers, are robust methods and statistics. For example, median is more robust than mean since the breakpoint of median is 1/2 and that of mean is 0, asymptotically. Both mean and median are estimable from Bayesian and frequentist methods. Instead of standard deviation, tactics like lower and upper quartiles to indicate error bars or Winsorization (or trim) to obtain a standard deviation for the error bar, are adopted regardless of Bayesian or frequenstist. Instead of the chi square goodness-of-fit tests, which assume Gaussian residuals, nonparametrics tests or distribution free tests can be utilized.

The notion that frequentist methods are not robust might have been developed from the frustration that those chi-square related methods in astronomy do not show robust characteristics. The reason is that data are prone to the breaks of the normality (Gaussianity) assumption. Also, the limited library of nonparametric methods in data analysis packages and softwares envisions that frequentist methods are not robust. An additional unattractive aspect about frequentist methods is that the description seems too mathematical, too abstract, and too difficult to be codified with full of unfriendly jargon whereas the Bayesian methods provide step by step modeling procedures with explanation why they chose such likelihood and such priors based on external knowledge from physics and observations (MCMC algorithms in the astronomical papers are adaptation of already proven algorithms from statistics and algorithm journals).

John Tukey said:

Robustnessrefers to the property of a procedure remaining effective even in the absence of usual assumptions such as normality and no incorrect data values. In simplest terms the idea is to improve upon the use of the simple arithmetic average in estimating the center of a distribution. As a simple case one can ask: Is it ever better to use the sample median than the samle mean, and if so, when?

I don’t think the whole set of frequentist methods is the complement set of Bayesians. Personally I feel quite embarrassed whenever I am told that frequentist methods are not robust compared to Bayesian methods. Bayesian methods become robust when a priori knowledge (subjective priors) allows the results to be unaffected by outliers with a general class of likelihood. Regardless of being frequentist or Bayesian, statistics have been developed to be less sensitive to outliers and to do optimal inferences, i.e. to achieve the goal, **robustness.**

Ah…there are other various accounts for **robustness methods/statistics in astronomy** not limited to “bayesian is robust.” As often I got confused whenever I see * statistically rigorous* while the method is a simple

[stat.CO:0808.2902]

A History of Markov Chain Monte Carlo–Subjective Recollections from Incomplete Data–by C. Robert and G. Casella

Abstract:In this note we attempt to trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940′s through its use today. We see how the earlier stages of the Monte Carlo (MC, not MCMC) research have led to the algorithms currently in use. More importantly, we see how the development of this methodology has not only changed our solutions to problems, but has changed the way we think about problems.

Here is the year list of monumental advances in the MCMC history,

- 1946: ENIAC
- late 1940′s: inception along with Monte Carlo methods.
- 1953:
**Metropolis algorithm**published in*Journal of Chemical Physics*(Metropolis et al.) - 1970:
**Hastings algorithms**in*Biometrika*(Hastrings) - 1974:
**Gibbs sampler**and**Hammersley-Clifford theorem**paper by Besag and its discussion by Hammersley in JRSSS B - 1977:
**EM algorithm**in JRSSS B (Dempster et al) - 1983:
**Simulated Annealing algorithm**(Kirkpatrick et al.) - 1984:
**Gibbs sampling**in*IEEE Trans. Pattern Anal. Mach. Intell.*(Geman and Geman, this paper is responsible for the name) - 1987:
**data augmentation**in*JASA*(Tanner and Wong) - 1980s: image analysis and spatial statistics enjoyed MCMC algorithms, not popular with others due to the lack of computing power
- 1990: seminal paper by Gelfand and Smith in
*JSAS* - 1991: BUGS was presented at the Valencia meeting
- 1992: introductory paper by Casella and Georgy
- 1994: influential MCMC theory paper by Tierney in
*Ann. Stat.* - 1995:
**reversible jump algorithm**in*Biometrika*(Green) - mid 1990′s: boom of MCMC due to particle filters, reversible jump and perfect sampling (second-generation of MCMC revolution)

and a nice quote from conclusion.

MCMC changed out emphasis from “closed form” solutions to algorithms, expanded our immpact to solving “real” applied problems, expanded our impact to improving numerical algorithms using statistical ideas, and led us into a world where “exact” now means “simulated”!

If you consider applying MCMC methods in your data analysis, references listed in Robert and Casella serve as a good starting point.

]]>Disclaimer: I never did serious Bayesian computations so that information I provide here tends to be very shallow. Both statisticians and astronomers oriented by Bayesian ideals are very welcome to add advanced pieces of information.

Bayesian statistics is very much preferred in astronomy, at least here at Harvard Smithsonian Center for Astrophysics. Yet, I do not understand why astronomy data analysis packages do not include libraries, modules, or toolboxes for MCMC (porting scripts from Numerical Recipes or IMSL, or using Python does not count here since these are also used by engineers and scientists of other disciplines: my view is also restricted thanks to my limited experience in using astronomical data analysis packages like ciao, XSPEC, IDL, IRAF, and AIPS) similar to WinBUGS or OpenBUGS. Most of Bayesian analysis in astronomy has to be done from the scratch, which drives off simple minded people like me (I prefer analytic forms and estimators than posterior chains). I hope easily implementable Bayesian Data Analysis modules come along soon to current astronomical data analysis systems for any astronomers who only had a lecture about Bayes theorem and Gibbs sampling. Perhaps, BUGS can be a role model to develop such modules.

As listed, one does not need R to use BUGS. WinBUGS is both stand alone and R implementable. PyBUGS can be handy since python is popular among astronomers. I heard that MATLAB (its open source counterpart, OCTAVE) has its own tools to maneuver Bayesian Data Analysis relatively easily. There are many small MCMC modules to solve particular problems in astronomy but none of them are reported to be robust enough so as to be applied in other type data sets. Not many have the freedom of choosing models and priors.

Hopefully, well knowledged Bayesians contribute in developing modules for Bayesian data analysis in astronomy. I don’t like to see contour plots, obtained from brute-forceful and blinded χ^{2} fitting, claimed to be bivariate probability density profiles. I’d like to project the module development like the way that BUGS is developed in astronomical data analysis packages with various Bayesian libraries. Here are some web links about BUGS:

The BUGS Project

WinBUGS

OpenBUGS

Calling WinBUGS 1.4 from other programs

- [astro-ph:0806.2228] T.J. Cornwell

**Multi-Scale CLEAN deconvolution of radio synthesis images** - [astro-ph:0806.2575] C.A.L. Bailer-Jones

**What will Gaia tell us about the Galactic disk?** - [astro-ph:0806.2823] Williams et al.

**Lensed Image Angles: New Statistical Evidence for Substructure**(Apart from their K-S tests, personally lensing is considered to be a nice subject from a geostatistics standpoint.) - [astro-ph:0806.3074] Eriksen and Wehus

**Marginal distributions for cosmic variance limited CMB polarization data** - [astro-ph:0806.2969] W. Boschin et al.

**Optical analysis of the poor clusters Abell 610, Abell 725, and Abell 796, containing diffuse radio sources**(astronomers call gaussian mixture models by KMM) - [astro-ph:0806.3096] Miller, Shimon, & Keating

**CMB Beam Systematics: Impact on Lensing Parameter Estimation**(Note Monte Carlo Markov Chain in the abstract, not Markov chain Monte Carlo.)

- [astro-ph:0806.1140] N.Bonhomme, H.M.Courtois, R.B.Tully

**Derivation of Distances with the Tully-Fisher Relation: The Antlia Cluster**

(Tully Fisher relation is well known and one of many occasions statistics could help. On the contrary, astronomical biases as well as measurement errors hinder from the collaboration). - [astro-ph:0806.1222] S. Dye

**Star formation histories from multi-band photometry: A new approach**(Bayesian evidence) - [astro-ph:0806.1232] M. Cara and M. Lister

**Avoiding spurious breaks in binned luminosity functions**

(I think that binning is not always necessary and overdosed, while there are alternatives.) - [astro-ph:0806.1326] J.C. Ramirez Velez, A. Lopez Ariste and M. Semel

**Strength distribution of solar magnetic fields in photospheric quiet Sun regions**(PCA was utilized) - [astro-ph:0806.1487] M.D.Schneider et al.

**Simulations and cosmological inference: A statistical model for power spectra means and covariances**

(They used R and its package Latin hypercube samples,*lhs*.) - [astro-ph:0806.1558] Ivan L. Andronov et al.

**Idling Magnetic White Dwarf in the Synchronizing Polar BY Cam. The Noah-2 Project**(PCA is applied) - [astro-ph:0806.1880] R. G. Arendt et al.

**Comparison of 3.6 – 8.0 Micron Spitzer/IRAC Galactic Center Survey Point Sources with Chandra X-Ray Point Sources in the Central 40×40 Parsecs**(K-S test)

- [astro-ph:0805.3532] Balan and Lahav

**ExoFit: Orbital Parameters of Extra-solar Planets from Radial Velocities**(MCMC) - [astro-ph:0805.3983] R. G. Carlberg et al.

**Clustering of supernova Ia host galaxies**(Jackknife method is utilized). - [astro-ph:0805.4005] Kurek, Hrycyna, & Szydlowski

**From model dynamics to oscillating dark energy parametrisation**(Bayes factor) - [astro-ph:0805.4136] C. Genovese et al.

**Inference for the Dark Energy Equation of State Using Type Ia Supernova Data** - [math.ST:0805.4141] C. Genovese et al.

**On the path density of a gradient field**(detecting filaments via kernel density estimation, KDE) - [astro-ph:0805.4342] C. Espaillat et al.

**Wavelet Analysis of AGN X-Ray Time Series: A QPO in 3C 273?** - [astro-ph:0805.4414] Tegmark and Zaldarriaga

**The Fast Fourier Transform Telescope** - [astro-ph:0805.4417] A. Georgakakis et al.

**A new method for determining the sensitivity of X-ray imaging observations and the X-ray number counts** - [stat.AP:0805.4598] E. Anderes et al.

**Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery**

- [stat.ME:0805.2756] Fionn Murtagh

**The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering** - [astro-ph:0805.2945] Martin, de Jong, & Rix

**A comprehensive Maximum Likelihood analysis of the structural properties of faint Milky Way satellites** - [astro-ph:0805.2946] Kelly, Fan, & Vestergaard

**A Flexible Method of Estimating Luminosity Functions**[my subjective comment is added at the bottom] - [stat.ME:0805.3220] Bayarri, Berger, Datta

**Objective Bayes testing of Poisson versus inflated Poisson models**(will it be of use when one is dealing with many zero background counts, underpopulated above zero background counts, and underpopulated source counts?)

[**Comment**] You must read it. It can serve as a very good Bayesian tutorial for astronomers. I think there’s a typo, nothing major, plus/minus sign in the likelihood, though. Tom Loredo kindly has informed through his extensive slog comments about Schechter function and this paper made me appreciate the gamma distribution more. Schechter function and the gamma density function share the same equation although the objective of their use does not have much to be shared (Forgive my Bayesian ignorance in the extensive usage of gamma distribution except the fact it’s a conjugate of Poisson or exponential distribution).

FYI, there was another recent arxiv paper on zero-inflation [stat.ME:0805.2258] by Bhattacharya, Clarke, & Datta

**A Bayesian test for excess zeros in a zero-inflated power series distribution**

In addition to Bayesian methodologies, like this week’s astro-ph, studies on characterizing empirical spatial distributions of voids and galaxies frequently appear, which I believe can be enriched further with the ideas from stochastic geometry and spatial statistics. Click for what was appeared in arXiv this week.

- [astro-ph:0805.0156]R. D’Abrusco, G. Longo, N. A. Walton

**Quasar candidates selection in the Virtual Observatory era** - [astro-ph:0805.0201] S. Vegetti& L.V.E. Koopmans

**Bayesian Strong Gravitational-Lens Modelling on Adaptive Grids: Objective Detection of Mass Substructure in Galaxies**(many like to see this paper: nest sampling implemented, discusses penalty function and tessllation) - [astro-ph:0805.0238] J. A. Carter et al.

**Analytic Approximations for Transit Light Curve Observables, Uncertainties, and Covariances** - [astro-ph:0805.0269] S.M.Leach et al.

**Component separation methods for the Planck mission** - [astro-ph:0805.0276] M. Grossi et al.

**The mass density field in simulated non-Gaussian scenarios** - [astro-ph:0805.0790] Ceccarelli, Padilla, & Lambas

**Large-scale modulation of star formation in void walls**

[astro-ph:0805.0797] Ceccarelli et al.

**Voids in the 2dFGRS and LCDM simulations: spatial and dynamical properties** - [astro-ph:0805.0875] S. Basilakos and L. Perivolaropoulos

**Testing GRBs as Standard Candles** - [astro-ph:0805.0968] A. A. Stanislavsky et al.

**Statistical Modeling of Solar Flare Activity from Empirical Time Series of Soft X-ray Solar Emission**

I’m glad to see this week presented a paper that I had dreamed of many years ago in addition to other interesting papers. Nowadays, I’m more and more realizing that astronomical machine learning is not simple as what we see from machine learning and statistical computation literature, which typically adopted data sets from the data repository whose characteristics are well known over the many years (for example, the famous iris data; there are toy data sets and mock catalogs, no shortage of data sets of public characteristics). As the long list of authors indicates, machine learning on astronomical massive data sets are never meant to be a little girl’s dream. With a bit of my sentiment, I offer the list of this week:

- [astro-ph:0804.4068] S. Pires et al.

**FASTLens (FAst STatistics for weak Lensing) : Fast method for Weak Lensing Statistics and map making** - [astro-ph:0804.4142] M.Kowalski et al.

**Improved Cosmological Constraints from New, Old and Combined Supernova Datasets** - [astro-ph:0804.4219] M. Bazarghan and R. Gupta

**Automated Classification of Sloan Digital Sky Survey (SDSS) Stellar Spectra using Artificial Neural Networks** - [gr-qc:0804.4144]E. L. Robinson, J. D. Romano, A. Vecchio

**Search for a stochastic gravitational-wave signal in the second round of the Mock LISA Data challenges** - [astro-ph:0804.4483]C. Lintott et al.

**Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey** - [astro-ph:0804.4692] M. J. Martinez Gonzalez et al.

**PCA detection and denoising of Zeeman signatures in stellar polarised spectra** - [astro-ph:0805.0101] J. Ireland et al.

**Multiresolution analysis of active region magnetic structure and its correlation with the Mt. Wilson classification and flaring activity**

A relevant post related machine learning on galaxy morphology from the slog is found at svm and galaxy morphological classification

< **Added: 3rd week May 2008>[astro-ph:0805.2612] S. P. Bamford et al.
Galaxy Zoo: the independence of morphology and colour**