ICHASC

Topics in Astrostatistics

Statistics 310, Harvard University
Statistics 281, University of California, Irvine

AY 2013-2014

Instructor	Prof. Meng Xiao Li (HU)
	Prof. David van Dyk (ICL)
	Prof. Yu Yaming (UCI)
Schedule	Tuesdays Noon - 2PM ET
Location	SciCen 706

Presentations
Meng Xiao-Li (Harvard) 3 Sep 2013	A Statistician's View of Upcoming Grand Challenges A talk to introduce new students to concepts and challenges in AstroStatistics. Reprise of a talk given at the American Astronomical Society, highlighting some of the work done by former grad students. Slides [.ppt]
Meng Xiao-Li (Harvard) Kaisey Mandel (CfA) 17 Sep 2013	The Ig-Nobel 24/7 Lecture on Statistics (XLM) transcript [.docx] Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances (KM) Abstract: Type Ia supernovae (SN Ia) are the most precise cosmological distance indicators, important for measuring the acceleration of the Universe and the properties of dark energy. I will give an overview of two applications in which Bayesian statistical modeling has been effective for using extracting scientific inferences from the observational data. Supernova types are typically inferred using the spectra or photometric light curves (time series) of the supernova. I will describe an alternative approach for probabilistic classification of supernovae by modeling the correlations between supernova class and galaxy properties. Next, I will describe a principled, hierarchical Bayesian approach to coherently model the multiple random and uncertain effects, such as measurement error, dimming and reddening due to interstellar dust and intrinsic covariance, underlying the observed SN Ia data. This strategy is applied to the modeling of spectroscopic and color data to estimate physical correlations and improve estimates on the effects of dust. These applications demonstrate aspects of Bayesian model building, statistical computation, and model checking and evaluation. Slides [.pdf]
Aneta Siemiginowska (CfA) 1 Oct 2013	Stochastic Model for Quasar Variability Abstract: Quasars are highly energetic active nuclei of distant galaxies. Because quasars are very luminous they allow us to probe the high redshift (young) universe. Their power comes from accretion onto a black hole of a mass exceeding a hundred million Solar masses. Quasars play a significant role in formation of galaxies and large scale structures in the universe. Although there has been huge progress in quasar research, astronomers still do not fully understand the physics of this phenomenon. In particular, physical processes occurring close to the black hole, and associated with the release of high energy radiation and relativistic jets are not fully understood. Studies of the high energy radiation, i.e. X-rays and gamma-rays, can give us some clues to Quasar physics. I will briefly show and describe the high energy data collection process and show examples of typical X-ray and gamma-ray observations. In a large part of my talk I will discuss the quasar variability and present our stochastic model for fluctuations in the observed light curves. The model is based on a linear combination of stochastic processes. We define the likelihood for the process enabling us to estimate the parameters of the process, including break frequencies in the power spectral density. I will discuss the application of our model to X-ray and gamma-ray light curves of high luminosity quasars and future directions for this project. Presentation slides [.pdf]
Saeqa Vrtilek (CfA) & Luke Bornn (Harvard) 22 Oct 2013	Adding a new dimension: Multi-variate studies of X-ray binaries Abstract: X-ray binaries consisting of a normal star orbiting a compact object owe their prominence to one of the most efficient energy release mechanisms known: accretion onto a compact object. The diverse behaviors displayed by X-ray binaries are well-studied, yet one of the most fundamental physical markers of each of these systems --- whether the central accreting object is a black hole or a neutron star --- has been remarkably difficult to establish. We have recently developed a model-independent means of identifying the nature of the compect object. We have found that given suitable collections of data in the right variables various types of X-ray binaries separate into complex but geometrically distinct volumes. We use clustering techniques to fully characterize the non-linear geometry of different object types in a rigorous and statistically sound manner. To exploit the vast prior literature, as well as the existing hierarchical structure in X-ray binary systems, we will embed physics-based models within Bayesian hierarchical models. We note that the tools we develop meet analogous purposes in all data-driven fields, as the fundamental problem of classification of multivariate data with complex geometric dependencies is field-spanning. Slides: Saku [.pptx] Saku [.pdf] Luke [.pdf]
Lazhi Wang (Harvard) 5 Nov 2013	X-ray Dark Sources Detection Abstract: The goal of source detection is often to obtain the luminosity function, which specifies the relative number of sources at each luminosity for a population. Of particular interest is the existence of "dark" sources in the population. In this talk, I will first briefly review the problem and the hierarchical Bayesian model, in which a zero-inflated gamma distribution is used to model the source intensities. Two small extensions of the original models are given. Second, I will review the hypothesis testing on the existence of "dark" sources and how the posterior predictive p-value (ppp) is obtained. Thirdly, some extensive simulation results will be given to demonstrate the robustness of the model, the non-informativeness of the prior and the effectiveness of ppp. Finally, results for real data analysis will be shown and discussed in detail. Slides [.pdf]
David Jones (Harvard) 12 Nov 2013	Overlapping Sources Abstract: Sources are often situated close enough together that they cannot be fully resolved instrumentally and it is of interest to infer the number of individual sources, their locations, and their respective intensities. Convolving a number of sources with the point spread function (PSF) results in a finite mixture model. We incorporate spectral models, background contamination, and a latent Poisson process for the number and positions of the sources. The resulting multilevel model is fit with RJMCMC (Richardson and Green 1997) to obtain posterior distributions for the number of sources and their individual parameters. Sensitivity to the prior distribution on the number of sources is low due to our knowledge of the PSF. A simulation study with a range of source separations, relative source intensities and background strengths will be presented to demonstrate performance and the impact of including the spectral data. Lastly results for two real datasets will be discussed. In one case the main aim is to determine the number of sources, their positions and intensities. Here the spectral data will be included to reduce uncertainty. In the second case, separation of the spectral distributions themselves will also be considered. Slides [.pdf]
Raymond Wong (UCD) 26 Nov 2013	Joint Spectral-Temporal Analysis of High-Energy Astronomical Sources Abstract: In this work we apply semi-parametric techniques to the joint spectral-temporal modeling of high-energy astronomical data. This includes the automatic detection of emission lines and structural breaks in the temporal direction. We apply L1 penalties to regularize the model fitting. The "dimension" of the best-fitting model is chosen by a new form of the minimum description length principle that is designed for the "large p small n" scenario. Slides [.pdf]
Xu Jin (UCI) 3 Dec 2013	Calibration Uncertainty
Group 14 Jan 2014	Group Discussion on the Time Delay Challenge Strong Lens TDC website: http://timedelaychallenge.org
David van Dyk (Imperial) 21 Jan 2014	The Assessment of Evidence in the Discovery of a Higgs Boson Abstract: [.pdf] The 2012-2013 discovery of a Higgs boson filled the last remaining gap in the Standard Model of particle physics and was greeted with fanfare in the scientific community and by the public at large. Particle physicists have developed and rigorously tested a specialized statistical tool kit that is designed for the search for new physics. This tool kit was put to the test in a 40-year search that culminated in the discovery of a Higgs boson. This talk reviews these statistical methods, the controversies that surround them, and how they led to this historic discovery. It concludes with a Bayesian critique of the use of p-values to assess the evidence for a Higgs boson and a discussion of the possible use instead of Bayesian methods that are being developed for a related statistical problem in high-energy astrophysics. Keywords: Bayes factors, detection, exclusion, hypothesis testing, look elsewhere effect, Lindley's paradox, Poisson models, sensitivity, upper limits Presentation slides [.pdf]
Eric Feigelson 29,31 Jan 2014 Phillips Auditorium 60 Garden St., CfA	Tutorials on AstroStatistics and R Tutorial website
Group 4 Feb 2014	Projects
Giridhar Gopalan (Harvard) (joint work with Dr. Peter Plavchan) 11 Feb 2014	Removing "Systematic" Noise From Lightcurves Abstract: In recent decades, the analysis of photometric time series data (lightcurves) generated from wide-field surveys has been fruitful for the detection of exoplanets, amongst other transients. Unfortunately these data are often dominated by "systematic" noise which is caused by factors such as seeing conditions and instrumental effects and simultaneously affects many of the lightcurves in a wide-field survey. We apply an implementation of the Trend Filtering Algorithm (TFA) developed by Bakos and Kovacs to the 2MASS calibration catalog and selected Palomar Transient Factory (PTF) photometric time series data. In this method, a basis of lightcurves are chosen which represent systematic noise well and noise is considered to be the least squares projection onto this basis. Unsurprisingly, TFA is successful at reducing the overall variability of lightcurves but has a tendency to filter true signal due to over-fitting when the number of template lightcurves is large. To rectify these issues, we modify TFA by including measurement uncertainties in its computation, including ancillary data correlated with noise, and choosing a template set using unsupervised learning approaches such as Agglomerative Hierarchical and KMeans Clustering similar in spirit to a paper by Kim et al. These modifications seem to reduce the variability of lightcurves without over-fitting. We conclude by considering alternative routes to solving this problem, including a hierarchical Bayesian approach which utilizes a wavelet basis to explore the frequency domain, extending past work by Dr. Alex Blocker. This talk is based on work done under the supervision of Dr. Peter Plavchan at IPAC a few summers ago. Presentation slides [.pdf]
Min Shandong (UC Irvine) 18 Feb 2014	Quantifying The Sensitivity of The Bayes Factor on The Choice of Prior Distribution in High-Energy Astrophysical Analysis Abstract: There is an important class of model selection problems in astrophysics where the standard asymptotics of the likelihood ratio test do not apply. Uncalibrated frequency based methods nonetheless remain the standard approach among astronomers. This project will study in detail the use of the Bayes Factor for emission line detection in spectral analysis. We develop a method to quantify the typically strong dependency of the Bayes Factor on the prior distribution with the aim of identifying a tenable class of priors under location-scale families where the Bayes Factor leads to a clear choice among the possible models. We compare the results with those obtained with posterior predictive p-values and the traditional likelihood ratio test. We will also talk about the efficiency and accuracy of the available methods to calculate Bayes Factors and give suggestions in the context of spectral analysis. Presentation slides [.pdf]
Randall Smith (CfA) 25 Feb 2014	Modeling X-ray Spectra of Astrophysical Plasmas: Implications for Statistical Analysis Abstract: Existing high-resolution astrophysical X-ray spectra has exposed the need for high-quality atomic data of all stripes: wavelengths, collisional and absorption cross sections, and radiative rates. We have created such a repository with AtomDB, a database of atomic properties relevant to high-resolution X-ray spectroscopy. However, the Astro-H soft X-ray spectrometer (2015 launch) will vastly increase the number and type of high-resolution X-ray spectra available and likely expose a number of shortcomings in our models. In addition, the demand for meaningful uncertainties on atomic data has been increasing over the last several years from those who rely on atomic data in modeling astrophysical plasmas. Uncertainties are not only critical in assessing the quality of the data, but can be propagated through modeling codes to obtain uncertainties on diagnosed quantities. I will discuss the current status and future plans for the AtomDB database as well as invite dialogue about how the community's need for practical measures of uncertainty can be addressed given our current capabilities. Presentation slides [.pdf]
Andreas Zezas (Crete) 25 Mar 2014	Three Problems Abstract: I will present three examples of common data analysis in Astrophysics that could greatly benefit from more sophisticated statistical methods. a. The first example lies in the area of model selection. Two dimensional fits to images of galaxies are a powerful tool for classifying them and measuring the contribution of their different stellar components. One of the challenges in this process is to select between different non-nested models that provide fits of similar quality. This is further complicated by morphological components such as spiral arms that are hard to model. b. The secod example is in the area of source classification. Classification of both stars and activity in galaxies is based on empirical diagnostic schemes. I will discuss the need for a more quantitative classification schemes, and I will describe potential methods and challenges. c. The third problem is related to joint fits of datasets of different sizes. One example is joint fits of spectral energy distributions and high resolution spectra that together can better constrain the emission mechanisms and the parameters of the gas in galaxies. The large difference in the size of the two datasets hampers the use of standard fitting methods. Presentation slides: [pdf] ; [pptx]
David van Dyk (Imperial) 28 Mar 2014 11am-1pm Pratt Conference Room 60 Garden St., CfA	Lecture on Markov Chain Monte Carlo Includes background information on MCMC methods, description of practical challenges and advice, and overview of recommended strategy. It is intended as a tutorial. Slides [.pdf] Note: We tried to stream this live on YouTube, but it failed for reasons as yet unknown. Our apologies to everyone who tried to watch it online.
Tak Hyungsuk (Harvard) 1 Apr 2014	Parametric Bayesian Approach to Time Delay Estimation Abstract: Light rays from a source (eg. quasar) can take different paths toward the earth due to the gravitational fields of intervening matter. The arrival times of these light rays vary depending on the lengths of paths, and these differences in arrival time are called time delays. In this talk, I deal with the simplest case based on two light curves, one of which lags behind the other, and suggest a full Bayesian model based on Ornstein-Uhlenbeck process to obtain the posterior distribution of the time delay. Various grid-search methods (eg. Chi-square minimization) to find optimal time delay estimate have been dominating this field, though computationally inefficient. In this sense, this parametric Bayesian approach is promising because it turns the time-consuming optimization problem into a simple sampling problem with a fast and stable Gibbs sampling scheme. Two real data examples will be used to show these points. Slides [.pdf]
Bozena Czerny (Copernicus, Warsaw) 8 Apr 2014	Reverberation of high-redshift quasars and its application to trace the dark energy Abstract: Dark energy is one of the most puzzling issues in astronomy and physics. Therefore, we need a number of independent methods to establish its properties, since every method can have systematic bias. One of the new methods is to use quasars instead of SN Ia. In quasars there is a physical link between the timescale and the absolute luminosity, so the key point in quasar application is to determine precisely the time delay between the continuum and the emission line in the optical band. Such reverberation is rutinely done for nearby objects but it is difficult for distant quasars due to lower S/N ratio and intrinsic variability amplitude in timescales of years. In Czerny et al. (2013) we showed the simulations of the campaigne done at present with 11-m SALT telescope, and we tried three time delay methods (Chi2 fitting, ICCF and ZDCF) but we now see that the time gaps are larger than expected so more sophisticated method is needed. Alternative source of data (LAMOST) will provide much better time covering but the quality of a single measurement is much worse than for dedicated spectroscopic monitoring with SALT which also calls for more advanced statistical approach. Slides [.pdf]
David Jones (Harvard) 15 Apr 2014 HU Stats Graduate Talks Series 12:30pm SciCen 705	Overlapping Astronomical Sources Abstract: Astronomical sources are often situated close enough together that they cannot be fully resolved instrumentally and it is of interest to infer the number of individual sources, their locations, and their respective intensities. Typically approximate source regions are identified and then spatial analysis is performed, this is later followed by a sepa- rate spectral analysis. We instead suggest an approach which jointly infers spatial and spectral parameters. Convolving a number of sources with the point spread function (PSF) results in a spatial finite mixture model. We incorporate spectral models, back- ground contamination, and a latent Poisson process for the number and positions of the sources. The resulting multilevel model is fit with reversible jump Markov chain Monte Carlo (RJMCMC) to obtain posterior distributions for the number of sources and their individual parameters. Sensitivity to the prior distribution on the number of sources is low due to our knowledge of the PSF. Results for a simulation study and two real datasets will be discussed. The first dataset is from Chandra and suggests that the spectral model provides some protection against chance variation in PSF or back- ground. The second is from the XMM observatory and others some hope for separating two previously inseparable spectral distributions. Slides [.pdf]
Keli Liu (Harvard) 29 Apr 2014	Prior and Prejudice: An Algorithm for Weakening Prior Influence Abstract: Prejudice leads to bias. A Bayesian prior favoring select models over others can pervert the data to fit our prior worldview-this favoritism runs against the spirit of scientific objectivity. As Bayesian methods become more popular, we need to protect ourselves against the often subtle ways that a prior, chosen for convenience, can deleteriously influence posterior inferences. We develop a formal diagnostic to assess how prejudiced a prior is (and for which models). We then exploit this diagnostic to remove the favoritism that a prior exhibits. The result is our prior weakening algorithm: an astronomer begins with a prior chosen solely for convenience (this prior may be highly prejudiced) and iteratively applies our algorithm until it becomes "fair". Posterior inferences under this fair prior are driven by the data and not by artifacts in the prior. Such inferences are hence trustworthy. There is a large literature on prior construction often involving ad hoc (and intractable) derivations in any given problem-the practitioner is left with few options if the literature has not suggested a prior for their model (usually the case). Our prior weakening algorithm replaces analytic derivation (manual and expensive) with computation (automated and cheap)-placing the ball in the practitioner's court. If you're wondering whether a chosen prior is harboring hidden biases, just plug it into the prior weakening algorithm and turn the crank. (Joint work with Xiao-Li Meng and Natesh Pillai)
Andreas Zezas (Crete) 6 May 2014 12:15 pm EDT	Three and a Half Problems Abstract: This is the second part of the discussion of three problems that started on the March 25th presentation. The focus will be in the presentation of a model selection problem in two-dimensional fits to images of galaxies. These fits are a powerful tool for classifying them and measuring the contribution of their different stellar components. One of the challenges in this process is to select between different non-nested models that provide fits of similar quality. This is further complicated by morphological components such as spiral arms that are hard to model. Also following the discussion of the classification of stars, I will present a similar problem on the classification of activity in galaxies based on measurements of their spectral emission lines. Finally, if time permits, we can continue the discussion on the other two problems presented in the first part of this talk, namely fitting of datasets of different sizes and resolution, and classification of stellar spectra.

Archive
Fall/Winter 2004-2005 Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D.
Fall/Winter 2005-2006 van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.-L.
Fall/Winter 2006-2007 Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.
Fall/Winter 2007-2008 Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.
Fall/Winter 2008-2009 H. Lee / A. Connors, B. Kelly, & P. Protopapas / P. Baines / A. Blocker / J. Hong / H. Chernoff / Z. Li / L. Zhu (Feb) / A. Connors (Pt.1) / A. Connors (Pt.2) / L. Zhu (Mar) / E. Kolaczyk / V. Liublinska / N. Stein
Fall/Winter 2009-2010 A.Connors / B.Kelly / N.Stein, P.Baines / D.Stenning / J. Xu / A.Blocker / P.Baines, Y.Yu / V.Liublinska, J.Xu, J.Liu / Meng X.L., et al. / A. Blocker, et al. / A. Siemiginowska / D. Richard / A. Blocker / Xie X. / Xu J. / V. Liublinska / L. Jing
AcadYr 2010-2011 Astrostat Haiku / P. Protopapas / A. Zezas & V. Kashyap / A. Siemiginowska / K. Mandel / N. Stein / A. Mahabal / Hong J.S. / D. Stenning / A. Diaferio / Xu J. / B. Kelly / P. Baines & I. Udaltsova / M. Weber
AcadYr 2011-2012 A. Blocker / Astro for Stat / B. Kelly / R. D'Abrusco / E. Turner / Xu J. / T. Loredo / A. Blocker / P. Baines / A. Zezas et al. / Min S. & Xu J. / O. Papaspiliopoulos / Wang L. / T. Laskar
AcadYr 2012-2013 N. Stein / A. Siemiginowska / D. Cervone / R. Dawson / P. Protopapas / K. Reeves / Xu J. / J. Scargle / Min S. / Wang L. & D. Jones / J. Steiner / B. Kelly / K. McKeough
AcadYr 2013-2014 Meng X.-L. / Meng X.-L., K. Mandel / A. Siemiginowska / S. Vrtilek & L. Bornn / Lazhi W. / D. Jones / R. Wong / Xu J. / van Dyk D. / Feigelson E. / Gopalan G. / Min S. / Smith R. / Zezas A. / van Dyk D. / Hyungsuk T. / Czerny, B. / Jones D. / Liu K. / Zezas A.

CHASC

Last Updated: 2014may06

ICHASC

Topics in Astrostatistics

Statistics 310, Harvard University Statistics 281, University of California, Irvine

AY 2013-2014

Statistics 310, Harvard University
Statistics 281, University of California, Irvine