Last Updated: 2012may29


Topics in Astrostatistics

Statistics 310, Harvard University
Statistics 281, University of California, Irvine

AY 2011-2012

Instructor Prof. Meng Xiao Li (HU)
  Prof. David van Dyk (ICL)
  Prof. Yu Yaming (UCI)
Schedule Tuesdays 11:30PM - 1:30PM ET
Location SciCen 706

Alex Blocker (Harvard U)
6 Sep 2011
A taste of astrostatistics: problems, opportunities, & connections
Abstract: Astrostatistics is a vibrant, tight-knit field with more open problems than statisticians to tackle them. These range from very applied, such as understanding the workings of space telescopes, to fundamental questions of statistical inference, and sophisticated computation is the order of the day. The latter is particularly true as new instruments generate huge volumes of data.
I will provide a sampling of two projects from astrostatistics: inferring the brightness of faint galaxies using the Chandra space telescope, and finding unusual events within millions of astronomical time series. These presented major inferential challenges in radically different ways. Addressing them took a combination of statistical modeling, scientific knowledge, and computational finesse.
Finally, I will share some surprising connections between astrostatistics and my work in biology. Biology appeared to lead astronomy in data analysis for many years, but the fields are now coming full circle. The newest forms of biological data share many features with modern astronomical data; there is a great potential for "methodological arbitrage" here for graduate students willing to dive into astrostatistics.
Slides [.pdf]
Astro Projects for Statistics
20 Sep 2011
Projects, problems, and demos
Doubt: How Do I Know if that is a Real Feature in My Image? (Alanna C)
Timing analysis of grating data (Vinay K)
Real time feature detection and classification (Pavlos P)
Issues in modeling the X-ray data (Aneta S)
Quasar clustering project (Brandon K)
Simplicity: Bayesian Energy Quantiles, or Quick Non-parametric way(s) to incorporate Higher Dimensional Data (Alanna C)
Source detection in 4D (Vinay K)
Physics demos: poisson, atomic lines, dispersion spectra (Alanna C)
11-13 Oct 2011
Tuesday 11 Oct
9:30a - 10:15a: pyBLoCXS (at SciCen 706)
10:15a - 11:15a: proposal
11:30a - 12:15p: new projects
12:15p - 1:00p: Bayes Factors
2:00p - 3:00p: Full Bayes Calibration Uncertainties
3:00p - 3:30p: 2D Cal Uncertainties and SCA
Wednesday 12 Oct
10:00a - 10:30a: SolarStat (at CfA Fishbowl)
10:30a - 11:15a: Sunspot Classification
11:15a - 11:45a: Sunspot Cycles
1:00p - 2:00p: Timing analysis with grating data (at CfA M-240)
2:00p - 2:45p: Solar DEM features
2:45p - 4:00p: computing
Thursday 13 Oct
10:00a - 11:00a: pySALC (at CfA M-240)
11:00a - 2:30p: proposal
Brandon Kelly (CfA/UCSB)
25 Oct 2011
Investigating Star Formation through Hierarchical Bayesian Modeling of Emission from Astronomical Dust
Abstract: Astronomical dust plays an important role in the formation of stars and planets. Recently launched observatories, such as Herschel and Planck, are providing observations which provide important constraints on the properties of astronomical dust. However, the traditional least-squares analysis used by astronomers is highly inefficient for this problem, and leads to biases and incorrect conclusions. In this talk I will discuss a hierarchical Bayesian approach to deriving the physical parameters of astronomical dust, as well as the distribution of these parameters. I will also discuss an ancillarity-sufficiency interweaving strategy for boosting the efficiency of the MCMC sampler. Finally, I will present results from our model as applied to a nearby star-forming region. The results obtained from our Bayesian approach lead to opposite scientific conclusions compared to those obtained from the least-squares analysis. The results obtained from our Bayesian analysis are consistent with astrophysical theories of dust formation, while the least-squares results are inconsistent with astrophysical theory.
Slides: [.pdf] | [.ppt]
Raffaele D'Abrusco (CfA)
1 Nov 2011
Knowledge Discovery workflows for exploration of complex multi-wavelengths astronomical datasets. Application to CSC+, a sample of AGNs built on the Chandra Source Catalog
Abstract: A complete understanding of all astronomical sources requires a global multi-wavelength approach and that, at the same time, the availability of large surveys of the sky in different spectral regions has propelled the aggregation of massive and complex datasets. The traditional approach to data analysis that involves well informed testing of different models cannot make justice of the richness of the these new datasets and, in some sense, of the intrinsically peculiar type of knowledge therein contained. Knowledge Discovery (KD) techniques, while relatively new to astronomy, have been successfully used in several other disciplines, from finance to genomics, for the determination of complex or simple but yet unseen patterns in large datasets.
In this talk I shall describe CLaSPS, a method for the characterization of the multi-dimensional astronomical sources, based on KD unsupervised clustering algorithms that are used to determine the spontaneous aggregations of sources in the high-dimensional space generated by their observables. Then, a data-driven criterion is applied to pick the most interesting clusterings in terms of astronomical properties of the sample.
I will discuss the application of this method to a sample of optically selected AGNs with X-ray observations in the Chandra Source Catalog and other multi-wavelength data, which is representative of the VO-powered inhomogeneous astronomical dataset that will be more and more common in the future. The goals of this project are to test known correlations, possibly determine new patters and establish diagnostics for an improved classification of X-ray selected AGNs with multi-wavelength observations. As an example of unknown low-dimensional patters. I will also briefly discuss a recent result on Blazars which is by-product of the application of CLaSPS to a sample of AGNs with multi-wavelength data.
Slides [.pdf]
Ed Turner (Princeton)
15 Nov 2011
A Bayesian Analysis of the Astrobiological Implications of the Rapid Emergence of Life on the Early Earth
Abstract: Life arose on Earth sometime in the first few hundred million years after the young planet had cooled to the point that it could support water-based organisms on its surface. The early emergence of life on Earth has been taken as evidence that the probability of abiogenesis is high, if starting from young-Earth-like conditions. This argument is revisited quantitatively in a Bayesian statistical framework. Using a simple model of the probability of abiogenesis, a Bayesian estimate of its posterior probability is derived based on the datum that life emerged fairly early in Earth's history and that, billions of years later, sentient creatures noted this fact and considered its implications. Given only this very limited empirical information, the choice of Bayesian prior for the abiogenesis probability parameter has a very strong influence on the computed posterior probability. In particular, although life began on the Earth quite soon after it became habitable, that fact is statistically consistent with an arbitrarily low intrinsic probability of abiogenesis for plausible uninformative priors and, therefore, with life being arbitrarily rare in the Universe. The presentation will emphasize generic statistical properties of problems of this general character, which occur in cosmology and many other areas of science, as well as in the context of abiogenesis.
Slides [.pdf]
29 Nov 2011
20 Questions
Wherein stats grad students ask questions of astronomers, who, if they can't answer the question, will get to ask back a question on statistics. Also, demos.
Xu Jin (UC Irvine)
7 Feb 2012
New Results of Fully Bayesian
slides [.pdf]
Tom Loredo (Cornell)
15 Feb 2012
3:15pm - 4:30pm
Pratt Conference Room at CfA
Adaptive scheduling of exoplanet observations via Bayesian adaptive exploration
Abstract: I will describe ongoing work by a collaboration of astronomers and statisticians developing a suite of Bayesian tools for analysis and adaptive scheduling of exoplanet host star reflex motion observations. In this presentation I will focus on the most unique aspect of our work: adaptive scheduling of observations using the principles of Bayesian experimental design in a sequential data analysis setting. The idea is to iterate an observation-inference-design cycle so as to gain information about an exoplanet system more quickly than is possible with random or ad hoc scheduling. I will introduce the core ideas---decision theory and information measures---and highlight some of the computational challenges that arise when implementing Bayesian design with nonlinear models. Specializing to parameter estimation cases (e.g., measuring the orbit of planet known to be present), there is an important simplification that enables relatively straightforward calculation of greedy designs via maximum entropy sampling. We implement MaxEnt sampling using population-based MCMC to provide posterior samples used in a nested Monte Carlo integration algorithm. I will demonstrate the approach with a toy problem, and with a re-analysis of existing exoplanet data supplemented by simulated optimal data points.
Presentation slides [.pdf]
16-17 Feb 2012
Solar-Statistics mini Workshop
Thursday, Feb 16 (@ Pratt)
2:00pm - 3:45pm: Stats Tutorial
4:15pm - 6:00pm: Solar Tutorial
Friday, Feb 17 (@ Phillips)
9:00am - 10:30am: Feature Recognition
11:00am - 12:30pm: Thermal Structure
2:00pm - 3:30pm: Multi-D Joint Analysis
4:00pm - 5:30pm: Massive Data Streams
Alex Blocker (Harvard)
21 Feb 2012
Discussion of Maximal Information Coefficient
Abstract: The publication of Reshef et al's work on the maximal information coefficient (MIC) in late 2011 created a great deal of buzz across many disciplines. Their goal of identifying novel relationships in massive datasets and low-assumption approach resonated with many researchers, and the method's publication in Science amplified its impact substantially. However, this work has been less warmly received by the statistical community, where many consider it lacking compared to existing approaches. I will summarize the theory and application of MIC as presented by Reshef et al for scientists and statisticians, then provide a statistical review of their approach. The broader issues and lessons raised by this episode will also be discussed.
Presentation slides: [.pdf]
References and code for the talk from AB: [ab_20120221/]
Supplement to the Reshef et al paper (especially for the statisticians), linked from thoughts-on-mic-reshef-et-al-2011
Paul Baines (UC Davis)
6 Mar 2012
[via Skype]
LogN-LogS: Model Selection and Model Checking
The study of astrophysical source populations is often conducted using the cumulative distribution of the number of sources detected at a given sensitivity. The resulting log(N>S)-logS relationship can be used to compare and evaluate theoretical models for source populations and their evolution. In practice, however, inferring properties of source populations from observational data is complicated by detector-induced uncertainties, background contamination and missing data.
By investigating the connection between probabilistic and theoretical assumptions in commonly used logN-logS methods, we propose a new class of models with a more realistic physical interpretation. Our Bayesian approach leads to efficient inference for physical model parameters and the corrected log(N>S)-log(S) distribution for source populations. Our method extends existing work in allowing for both non-ignorable missing data and an unknown number of unobserved sources. In this talk we will focus on model selection issues and multivariate strategies for Bayesian model checking.
This is joint work with Andreas Zezas, Vinay Kashyap and Irina Udaltsova.
Presentation slides [.pdf]
Andreas Zezas (Crete)
20 Mar 2012
9am PDT / Noon EDT / 4pm GMT / 6pm EET
[via Skype]
Adaptive Smoothing powwow
Presentation slides [.pdf]
The goal is to derive the ideal tool for quick astronomical analysis: a statistically principled, adaptively smoothing, flux-conserving, semi-parametric tool that works in 2-D, on Poisson data, and runs reasonably quickly. Some useful papers to read up on:
-- ASMOOTH: A simple and efficient algorithm for adaptive kernel smoothing of two-dimensional imaging data, Ebeling, H., White, D.A., & Rangarajan, F.V.N., 2006, MNRAS, 368, 65 [arXiv:0601306]
-- csmooth, CIAO ahelp page, cxc/ciao/ahelp/csmooth
-- Multiple Testing of Local Maxima for Detection of Unimodal Peaks in 1D, Schwartzman, A., Gavrilov, Y., & Adler, R.J., 2011 [.pdf]
-- Multiple Testing of Local Maxima for Detection of Peaks in ChIP-Seq Data, Schwartzman, A., Jaffe, A., Gavrilov, Y., & Meyer, C.A., 2011, HU Biostatistics Working Paper Series, 133 [.pdf]
-- A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data, Freeman, P.E., Kashyap, V., Rosner, R., & Lamb, D.Q., 2002, ApJS, 138, 185 [.pdf]
-- Low Assumptions, High Dimensions, Wasserman, L., 2011, RMM v2, 201, in Statistical Science and Philosophy of Science [.pdf]
-- Multiscale Poisson Intensity and Density Estimation, Willett, R.M., and Nowak, R.D., 2007, IEEE Trans. on Inform. Theory, 53, 9 [.pdf]
-- Multiscale Photon-limited Spectral Image Reconstruction, Krishnamurthy, K., Raginsky, M., and Willett, B., 2009, SIIMS [.pdf]
-- Poisson Noise Reduction with Non-Local PCA, Salmon, J., Deledalle, C.A., Willett, R., and Harmany, Z., 2012, ICASSP [.pdf]
Min Shandong & Xu Jin (UCI)
03 Apr 2012
Bayes Factors (Shandong)
Presentation slides [.pdf]
Calibration (Jin)
Presentation slides [.pdf]
Omiros Papaspiliopoulos (U Pompeu Fabra)
10 Apr 2012
SMC2: an efficient algorithm for sequential analysis of state-space models
Nicolas Chopin, Pierre E. Jacob, Omiros Papaspiliopoulos
Abstract:We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.
paper available from
Presentation slides [.pdf]
Lazhi Wang (Harvard)
15 May 2012
Luminosity Functions
Abstract: The goal of source detection is often to obtain the luminosity function, which specifies the relative number of sources at each luminosity for a population. In this talk, I will first explain a hierarchical Bayesian approach to infer the distribution of intensities (luminosities) of all the sources in a population, given the background contaminated photon counts at the locations of the sources. The distribution of intensities is modeled as a zero-inflated gamma distribution. The zero-inflated component, which is a completely new idea in astronomical problems, models the proportion of dark sources (sources which do not emit any photons). Then, I will display some simulation results, including the joint posterior distributions of the parameters, the best fit of the zero-inflated gamma and the associated uncertainty. Finally, I will discuss different choices of priors for the hyper-parameters and the coverage percentages of the Bayesian model under different simulation studies and with different priors.
Presentation slides [.pdf]
Tanmoy Laskar (CfA)
29 May 2012
Quantifying the Non-Existent - Radio, X-ray and Optical Model Fitting with Non-Detects
Abstract: Non-detects are equally important in hypothesis testing and model-fitting as "detections". While several statistical tools have been developed in the bio-medical and environmental sciences on the incorporation of non-detects into robust analyses, percolation of these methods into Astronomy has been slow. To bridge this gap, I will discuss a project that involves simultaneous modeling of multi-wavelength light curves (in the context of Gamma-Ray Burst afterglows) - from the radio through the X-rays and seek to understand the best statistical method for quantifying and incorporating non-detects into the analysis.
Presentation slides: [.pdf] ; [.odp]

Fall/Winter 2004-2005
Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D.
Fall/Winter 2005-2006
van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.-L.
Fall/Winter 2006-2007
Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.
Fall/Winter 2007-2008
Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.
Fall/Winter 2008-2009
H. Lee / A. Connors, B. Kelly, & P. Protopapas / P. Baines / A. Blocker / J. Hong / H. Chernoff / Z. Li / L. Zhu (Feb) / A. Connors (Pt.1) / A. Connors (Pt.2) / L. Zhu (Mar) / E. Kolaczyk / V. Liublinska / N. Stein
Fall/Winter 2009-2010
A.Connors / B.Kelly / N.Stein, P.Baines / D.Stenning / J. Xu / A.Blocker / P.Baines, Y.Yu / V.Liublinska, J.Xu, J.Liu / Meng X.L., et al. / A. Blocker, et al. / A. Siemiginowska / D. Richard / A. Blocker / Xie X. / Xu J. / V. Liublinska / L. Jing
AcadYr 2010-2011
Astrostat Haiku / P. Protopapas / A. Zezas & V. Kashyap / A. Siemiginowska / K. Mandel / N. Stein / A. Mahabal / Hong J.S. / D. Stenning / A. Diaferio / Xu J. / B. Kelly / P. Baines & I. Udaltsova / M. Weber
AcadYr 2011-2012
A. Blocker / Astro for Stat / B. Kelly / R. D'Abrusco / E. Turner / Xu J. / T. Loredo / A. Blocker / P. Baines / A. Zezas et al. / Min S. & Xu J. / O. Papaspiliopoulos / Wang L. / T. Laskar