# Topics in Astrostatistics

## Statistics 310, Fall/Winter 2005-2006

### Harvard University

##### www.courses.fas.harvard.edu/~stat310/

 Instructor Prof. Meng Xiao Li Schedule Tuesdays 10 AM - 11:30 AM Location Science Center Rm 705

Presentations
Fall/Winter 2004-2005
David van Dyk (UC Irvine)
20 Sep 2005
Introduction
Michael Ratner (CfA)
Liu Jing Chen (Harvard U)
04 Oct 2005
On locating IM Peg for Gravity Probe B
Jin Jia Shun (Purdue U)
11 Oct 2005
Higher Criticism Statistic: Theory and Applications in Cosmology and Astronomy [.pdf]
Abstract [.pdf]

Of interest:
Higher Criticism for Detecting Sparse Heterogeneous Mixtures
-- Donoho, D., & Jin, J., Ann. Statist., Vol 32, 3, 962-994
Cosmological non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests
-- Jin et al. 2005, astro-ph/0503374
-- [.pdf]
Park Tae Young (Harvard U)
25 Oct 2005
Fitting Narrow Emission Lines in X-ray Spectra [.pdf]
Abstract: Spectral emission lines are local features that represent extra emissions of photons in a narrow band of energy. In a statistical model, it is often appropriate to model the emission lines with a narrow Gaussian function or a delta function. In this article, we show how to identify the location of the narrow line profiles using a model-based Bayesian statistical perspective. Such Bayesian methods are ideally suited to handling the complexity of high-resolution high-energy spectral data such as that obtained with the Chandra X-ray Observatory. van Dyk et al (2001) show how Bayesian methods can account for these complexities of the data generation mechanism as well as the Poisson nature of photon count data. The multimodal nature of the likelihood function poses difficulties for these methods, however, when the location and width of a spectral line are simultaneously fitted or when delta functions are used to model spectral lines. These difficulties necessitate more sophisticated, state-of-the-art statistical computation. We thus develop such methods and illustrate how to detect narrow spectral lines in X-ray spectra using Chandra data sets for the energy spectrum of the high redshift quasar PG 1634+706.
Chandra Calibration Workshop
01 Nov 2005
1:30pm-4:30pm
Special Session on Incorporating Calibration Uncertainties into Data Analysis
http://cxc.harvard.edu/ccw/
Andreas Zezas (SAO)
29 Nov 2005
X-ray data analysis techniques
Presentation [.ppt]
Hong Jae Sub (SAO)
7 Feb 2006
12 Noon - 1 PM
New spectral classification technique for faint X-ray sources: Quantile Analysis
[.ppt]
Abstract:
We describe a new spectral classification technique called quantile analysis for X-ray sources with limited statistics. The quantile analysis is superior to the conventional approaches such as X-ray hardness ratios or X-ray color analysis. The median is considered to be an improved substitute for the conventional X-ray hardness ratio and the quantile-based phase diagram is more evenly sensitive over various spectral shapes than the conventional color-color diagrams. We demonstrate the new technique by simulations using Chandra ACIS detector response function and the analysis results from the deep observations at the galactic center.
astro-ph/0406463
QCCD code
ChaMPlane
Aneta Siemiginowska (SAO) & Vinay Kashyap (SAO)
8 Feb 2006
12:30 Noon - 1:30 PM
X-ray Astrostatistics: Bayesian Methods in Data Analysis
Abstract:
We will describe the California-Harvard AstroStatistics Collaboration, CHASC. We will provide an introduction to Bayesian methods in the context of some basic X-ray astrophysics problems, such as determining the source strength in the presence of background, and hardness ratios in the regime of (very) low counts. We will also discuss posterior predictive p-values (PPP), which are the preferred alternatives to the often abused F-tests used for model comparisons.
AS's slides:
[.ppt] ; [.pdf]
VK's slides:
[.ppt] ; [.pdf]
Meng Xiao-Li (Harvard U)
25 Apr 2006
11am-Noon
A Brief Tutorial of Markov Chain Monte Carlo: A Workhorse for Modern Scientific Computation
Abstract:
The Markov chain Monte Carlo (MCMC) methods, originating in computational physics about half a century ago, have seen an enormous range of applications in recent statistical literature, due to their ability to simulate from very complex distributions such as the ones needed in realistic statistical models. This talk provides an introductory tutorial of the two most frequently used MCMC algorithms: the Gibbs sampler and the Metropolis-Hastings algorithm. Using simple yet non-trivial examples, we show, step by step, how to implement these two algorithms. The examples involve a family of bivariate distributions whose full conditional distributions are all normal but whose joint densities are not only non-normal, but also bimodal.
Presentation:
[.ppt] ; [.pdf]
Movies:
symmetric, Gibbs [.avi]
asymmetric, Gibbs, bad implementation [.avi]
asymmetric, Gibbs, better implementation [.avi]
Hyunsook Lee (Penn State)
7 Sep 2006
A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications
Abstract: We explore the convex hull peeling process to develop empirical tools for statistical inferences on multivariate massive data. Convex hull and its peeling process has intuitive appeals for robust location estimation. We define the convex hull peeling depth, which enables to order multivariate data. This ordering process provides ways to obtain multivariate quantiles including median. Based on the generalized quantile process, we define a convex hull peeling central region, a convex hull level set, and a volume functional, which lead us to invent one dimensional mappings, describing shapes of multivariate distributions along data depth. We define empirical skewness and kurtosis measures based on the convex hull peeling process. In addition to these empirical descriptive statistics, we find a few methodologies to separate multivariate outliers in massive data sets. Those outlier detection algorithms are (1) estimating multivariate quantiles up to the level $\alpha$, (2) detecting changes in a measure sequence of convex hull level sets, and (3) constructing a balloon to exclude outliers. The convex hull peeling depth is a robust estimator so that the existence of outliers do not affect properties of inner convex hull level sets. Overall, we illustrate all these characteristics and algorithms of the convex hull peeling process through bivariate synthetic data sets. We show that these empirical procedures are applicable to real massive data set by employing Quasars and galaxies from the Sloan Digital Sky Survey.
Presentation [.pdf]

CHASC