AstroStat Talks 2018-2019
Last Updated: 2019aug06


Topics in Astrostatistics

Statistics 310, Harvard University

AY 2018-2019


Schedule Tuesdays 12:07PM - 1:30PM ET
Location SciCen 706

David Stenning (Imperial)
19 Jul 2018
2pm-3pm EDT
SSXG Operations Center at CfA
Classification and Modeling of Evolving Solar Features
Abstract: Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. The goal of these observatories is to better understand and predict space weather. To analyze massive streams of solar image data, we have developed a science-driven dimension reduction methodology to extract scientifically meaningful features from images. Adopting a science-driven approach, as opposed to a solely black-box algorithmic approach, enables interpretable secondary data-driven analyses of complex phenomena, such as the evolution of magnetic active regions. The methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in active regions that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a much more informative manner than existing schemes (i.e. manual classification schemes), and (iii) is amenable to sophisticated statistical analyses.
Presentation slides [.pdf]
4 Sep 2018
Noon EDT
SciCen 706
Organizational & EBASCS
Cora Dvorkin (HU)
11 Sep 2018
Noon EDT
SciCen 706
Inverse Problems in Early Universe Cosmology
Abstract: Cosmological observations have provided us with answers to age-old questions, involving the age, geometry, and composition of the universe. However, there are profound questions that still remain unanswered. I will describe ongoing efforts to shed light on some of these questions. In this talk, I will explain how we can use measurements of the Cosmic Microwave Background and the large-scale structure of the universe to reconstruct the detailed physics of much earlier epochs, when the universe was only a tiny fraction of a second old. I will address this inverse-problem reconstruction from a Bayesian perspective.
Andrea Sottosanti (Imperial)
2 Oct 2018
Astronomical source detection and background separation via hierarchical Bayesian nonparametric mixtures
Abstract: We propose an innovative approach based on Bayesian nonparametric methods to the signal extraction of astronomical sources in gamma-ray count maps under the presence of a strong background contamination. Our model simultaneously induces clustering on the photons using their spatial information and gives an estimate of the number of sources, while separating them from the irregular signal of the background component that extends over the entire map. From a statistical perspective, the signal of the sources is modeled using a Dirichlet Process mixture, that allows to discover and locate a possible infinite number of clusters, while the background component is completely reconstructed using a new flexible Bayesian nonparametric model based on b-spline basis functions. The resultant can be then thought of as a hierarchical mixture of nonparametric mixtures for flexible clustering of highly contaminated signals. We provide also a Markov chain Monte Carlo algorithm to infer on the posterior distribution of the model parameters which does not require any tuning parameter, and a suitable post-processing algorithm to quantify the information coming from the detected clusters. Results on different datasets confirm the capacity of the model to discover and locate the sources in the analysed map, to quantify their intensities and to estimate and account for the presence of the background contamination.
Presentation slides [.pdf]
Xixi Yu (Imperial)
23 Oct 2018
Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models
Abstract: Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters.
Instead of using the standard approach, a common strategy deployed by the astrophysicists, that treats the uncertainty as fixed and known and obtains the best-fit values of the parameters, we propose a multistage analysis to prevent underestimation of the error bars on the model parameters and increase the accuracy of the analysis results. Four methods for a two-stage analysis are outlined, the standard method, multiple imputation, the pragmatic and the fully Bayesian methods. A case study on Fe XIII is discussed where two different priors, discrete uniform and Gaussian approximation via principal component analysis prior, are deployed.
Presentation slides [.pdf]
Yang Chen (UMich)
30 Oct 2018
A second look at cstat
Abstract: After decades of least chi-squares fitting and goodness-of-fit, the C-stat has been gaining popularity in the astrophysics community for model fitting and assessment of goodness-of-fit. In this work, we study the statistical properties of the C-stat and explore lower-resolution C-stat fitting and testing, which potentially improves statistical and computational efficiency. This is ongoing joint work with CHASC team.
David Jones (TAMU)
13 Nov 2018
Exoplanet detection: some statistical challenges
Abstract: The radial velocity (RV) technique is one of the two main approaches for detecting planets outside our solar system. The method works by detecting the Doppler shift resulting from the motion of a host star caused by an orbiting planet. Unfortunately, this Doppler signal is typically contaminated by various "stellar activity" phenomena, such as dark spots on the star surface. This makes it difficult to determine if a planet is really present or not.
Last time I presented a Gaussian process framework for separating planet RV signals from stellar activity. In this talk, I will review the key points of the method and discuss current statistical challenges and opportunities for generalizing and improving the approach. I will also discuss related computational challenges in exoplanet detection.
Presentation slides [.pdf]
Thomas Lee (UC Davis)
27 Nov 2018
Change Point Detection for Poisson Time Series Images with Applications to Astronomy and Astrophysics
Presentation slides [.pdf]
Hyungsook Tak (Notre Dame)
11 Dec 2018
Time Delay Lens Modeling Challenge for the Hubble Constant Estimation
Abstract: The Hubble constant is a core cosmological parameter that represents the current expansion rate of the Universe. One way to infer this quantity is to use strong gravitational lensing, i.e., an effect that multiple images of an astronomical object (e.g., a quasar) appear in the sky. This effect occurs when the trajectories of the light (from the object to the Earth) are bent by a strong gravitational field of an intervening galaxy. Strong gravitational lensing produces two types of the data; (i) multiple brightness time series data of the gravitationally-lensed images and (ii) pixel-wise image data of the lens and lensed object. The former is used to infer time delays between the arrival times of the multiply-lensed images (arXiv 1602.01462 ) and the latter is used to estimate gravitational potential that the lensed images pass through (arXiv 1801.01506 ). These two components are used to infer the Hubble constant via physical equations. In this talk, I explain how we infer the Hubble constant using the relationship among these three components, i.e., time delays, gravitational potential, and the Hubble constant. I will also describe the performance of this approach during the first stage of a blind competition, called the Time Delay Lens Modeling Challenge.
Presentation slides [.pdf]
Vinay Kashyap (CfA), Katy McKeough (HU), Luis Campos (HU), et al.
29 Jan 2019
SciCen 706
Introduction to High-energy Astronomy Data for Statisticians
We will describe what high-energy datasets look like using the example of the Chandra X-ray Observatory. We will then highlight some of the problems our group has tackled in the past, and focus in detail on two current projects: (i) to isolate and locate extended sources in posterior draw images, and (ii) to probabilistically disentangle photons from overlapping sources using spatial, spectral, and temporal variability information.
Chandra archive: [url]
Presentation slides: Kashyap, McKeough, Campos [.pdf]
Paul Baines (
5 Feb 2019
The Colorful Stars and the Black Box: Bayesian Analysis of Stellar Populations
Abstract: Many modern statistical applications involve noisy observations of an underlying process that can best be described by a complex deterministic system. In astrophysics these systems often involve the solution of partial differential equations representing the best available understanding of the underlying physical processes. Statistical computation in such settings is hampered by the use of look-up tables or expensive `black-box' function evaluations.
The estimation of properties of stellar populations provides an example of statistical modeling with such a `look-up table' likelihood. The mapping between the physical parameters and the data-space cannot be solved analytically and is represented as a series of look-up tables. In this context, we present a flexible hierarchical model for analyzing stellar populations. By utilizing the structure of the posterior distribution we construct efficient data augmentation schemes which create a robust sampling procedure. The performance of various sampling schemes are presented, together with the results of applying our model to a well-studied dataset.
Gabriel Collin (MIT)
12 Feb 2019
SciCen 706
Simulating light in large volume detectors using Metropolis Light Transport
Abstract: In gigaton scale neutrino detectors, such as the IceCube experiment, interaction products are detected by the Cherenkov radiation emitted by their passage through the detector medium. Simulating this propagation of light is traditionally approached through ray tracing. This is complicated by the sparsity of the detector: the vast majority of light rays are scattered and absorbed by the detector medium, with only a tiny fraction finding their way to a light sensitive element. In this presentation, I develop an alternative method, based on the Metropolis light transport algorithm used in the CGI industry. This method poses the problem as a classical path integral, and samples only the paths of light rays that end on a light sensitive element using a Markov chain Monte-Carlo. This yields a significant performance increase compared to ray tracing when simulating the timing distribution of light detected by a photo-sensitive element. The general concept behind this method can be widely applied, and I discuss some potential applications to other problem areas in physics and astronomy.
Presentation slides [.pdf]
Daniel Muthukrishna (Cambridge)
19 Feb 2019
Real-time classification of explosive transients using deep recurrent neural networks
Abstract: Astronomical transients are stellar objects that become temporarily brighter on various timescales and have led to some of the most significant discoveries in cosmology and astronomy. New and upcoming wide-field surveys such as the Zwicky Transient Facility (ZTF) and the Large Synoptic Survey Telescope (LSST) will record millions of multi-wavelength transient alerts each night. To meet this demand, we have developed a novel machine learning approach, RAPID (Real-time Automated Photometric Identification using Deep learning), that automatically classifies transients as a function of time. Using a deep recurrent neural network (RNN) with Gated Recurrent Units (GRUs), we are able to quickly classify multi-channel, sparse, time series datasets into 12 different astrophysical types. The classification accuracy improves over the lifetime of the transient as more photometric data becomes available. In this talk, I will explain the main parts of our deep neural network architecture and describe our approach's classification performance on simulated and real data streams.
Presentation slides [.pdf]
Di Zhang (UCIrvine)
5 Mar 2019
New population-based MCMC method
Abstract: We developed a general MCMC strategy to sample from difficult posterior distributions, for example multimodal distributions. As is well known, usual MCMC methods may easily get stuck in local modes and suffer from slow convergence. To sample from a probability density with likely complicated shape or multimodes, our approach is to run multiple chains from dispersed starting values and to propose a two-step jump at each iteration. The novel between chain jump proposal tries to move to the neighborhood of the iterate in another chain. The intuition is that by running multiple chains exploring different modes and enabling between chain jumps, essentially each chain would jump between modes efficiently. We applied the method in Bayes factor computation for Bayesian model selection, variance component estimation in mixed effect models and analysis of spectra data. As a numerical illustration, we use this method to fit thermal models with Capella data.
Presentation slides [.pdf]
Sara Algeri (UMinnesota)
19 Mar 2019
Detecting new signals under background mismodelling
Abstract: Searches for new astrophysical phenomena often involve several sources of non-random uncertainties which can lead to highly misleading results. Among these, model-uncertainty arising from background mismodelling can drammatically compromise the sensitivity of the experiment under study. Specifically, overestimating the background distribution in the signal region increases the chances of missing new physics. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming false discoveries. The aim of this work is to provide a unified statistical algorithm to perform modelling, estimation, inference and signal characterization under background-mismodelling. The method proposed allows to incorporate the (partial) scientific knowledge available on the background distribution, and provides a data-updated version of it in a purely nonparametric fashion, without requiring the specification of prior distributions. If a calibration sample or control regions are available, the solution discussed does not require the specification of a model for the signal; however, if the signal distribution is known, up to some free parameters, it allows to further improve the accuracy of the analysis and to detect additional signals of unexpected new sources.
Lucas Janson (Harvard)
9 Apr 2019
SCiCen 706
High-Dimensional Variable Selection via Model-X Knockoffs
Abstract: Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Cands, 2015) originally designed for controlling the false discovery rate in low-dimensional linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Our approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although we show preliminary evidence that our procedure is robust to unknown/estimated distributions. As we require no knowledge/assumptions about the conditional distribution of the response, we effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To our knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. We also apply our procedure to data from a case-control study of Crohn's disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data. This talk will be based on the following paper:
Presentation slides [.pdf]
Scott Ward (Imperial)
11 Jun 2019
644 Huxley, Imperial
Testing for non-homogeneity in point process on a sphere:
Functional summary statistics for Poisson processes on convex shapes in three dimensions
Scott Ward, Edward Cohen, Niall Adams
Abstract: Current methodologies in spatial point pattern analysis have been developed for planar and spatial data. Although this theory is applicable to a wide variety of practical applications, for example tree patterns in geology, many point patterns are not observed in Euclidean space. For example in microbiology, researchers are interested in patterns arising on the cellular membranes of bacteria. In this scenario point patterns are restricted to specific shapes such as spheres and ellipsoids. Research has only recently extended the theory to point pattern analysis on a sphere with many other shapes left unexplored, mainly due to the lack of isometries existing for general metric spaces. Our work discusses extensions of typical functional summary statistics, such as Ripley's K-function, for Poisson processes defined on convex shapes in three dimensions. By using the invariance of Poisson processes under transformations between metric spaces (known as the Mapping Theorem) we can transform a Poisson process from any convex shape to a Poisson process on the unit sphere and take advantage of its rotational symmetries to construct functional summary statistics. This allows us to determine whether patterns exhibit complete spatial randomness or determine if there exists spatial preference on the original convex space. In this talk we will present this methodology and discuss its potential application within astrophysics.
Abstract [.pdf]
Presentation slides [.pdf]
Zoe De Beurs (CfA)
6 Aug 2019
160 Concord
Classifying X-ray Binaries Using Three Machine Learning Methods
Abstract: Consisting of a compact object that accretes material from an orbiting secondary star, X-ray binaries have been observed for more than half a century. However, there is still no straightforward means to determine the nature of the compact object: a neutron star or a black hole. In this talk, we compare three classification machine learning methods (Bayesian Gaussian Processes, K-Nearest Neighbors, and Support vector Machines) to develop a tool for classifying the compact objects in X-ray binary systems. Each machine learning method uses spatial patterns which exists between systems of the same type in 3D Color-Color-Intensity diagrams. We test a Bayesian Gaussian Process model that has been used to classify sources observed with the RXTE/ASM with data from the more sensitive MAXI/GSC. Using the MAXI/GSC data, we reproduce the result that the model can accurately classify well-known X-ray binaries but sometimes classifies non-pulsing neutron star systems containing "bursters" as black holes when they are close to the boundary between black holes and neutron stars. We find that K-Nearest Neighbors and Support Vector Machines on average predict the correct classification with greater probability and speed than the Bayesian Gaussian Process. Overall, all three methods have a high predictive accuracy, indicating a feasible method to classify X-ray binaries into black holes, non-pulsing neutron stars, or pulsars.
Presentation slides [.pptx]

Fall/Winter 2004-2005
Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D.
Fall/Winter 2005-2006
van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.-L.
Fall/Winter 2006-2007
Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.
Fall/Winter 2007-2008
Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.
Fall/Winter 2008-2009
H. Lee / A. Connors, B. Kelly, & P. Protopapas / P. Baines / A. Blocker / J. Hong / H. Chernoff / Z. Li / L. Zhu (Feb) / A. Connors (Pt.1) / A. Connors (Pt.2) / L. Zhu (Mar) / E. Kolaczyk / V. Liublinska / N. Stein
Fall/Winter 2009-2010
A.Connors / B.Kelly / N.Stein, P.Baines / D.Stenning / J. Xu / A.Blocker / P.Baines, Y.Yu / V.Liublinska, J.Xu, J.Liu / Meng X.L., et al. / A. Blocker, et al. / A. Siemiginowska / D. Richard / A. Blocker / Xie X. / Xu J. / V. Liublinska / L. Jing
AcadYr 2010-2011
Astrostat Haiku / P. Protopapas / A. Zezas & V. Kashyap / A. Siemiginowska / K. Mandel / N. Stein / A. Mahabal / Hong J.S. / D. Stenning / A. Diaferio / Xu J. / B. Kelly / P. Baines & I. Udaltsova / M. Weber
AcadYr 2011-2012
A. Blocker / Astro for Stat / B. Kelly / R. D'Abrusco / E. Turner / Xu J. / T. Loredo / A. Blocker / P. Baines / A. Zezas et al. / Min S. & Xu J. / O. Papaspiliopoulos / Wang L. / T. Laskar
AcadYr 2012-2013
N. Stein / A. Siemiginowska / D. Cervone / R. Dawson / P. Protopapas / K. Reeves / Xu J. / J. Scargle / Min S. / Wang L. & D. Jones / J. Steiner / B. Kelly / K. McKeough
AcadYr 2013-2014
Meng X.-L. / Meng X.-L., K. Mandel / A. Siemiginowska / S. Vrtilek & L. Bornn / Lazhi W. / D. Jones / R. Wong / Xu J. / van Dyk D. / Feigelson E. / Gopalan G. / Min S. / Smith R. / Zezas A. / van Dyk D. / Hyungsuk T. / Czerny, B. / Jones D. / Liu K. / Zezas A.
AcadYr 2014-2015
Vegetabile, B. & Aldcroft, T., / H. Jae Sub / Siemiginowska, A. & Kashyap, V. / Pankratius, V. / Tak, H. / Brenneman, L. / Johnson, J. / Lynch, R.C. / Fan, M.J. / Meng, X.-L. / Gopalan, G. / Jiao, X. / Si, S. / Udaltsova, I. & Zezas, A. / Wang, L. / Tak, H. / Eadie, G. / Czekala, I. / Stenning, D. / Stampoulis, V. / Aitkin, M. / Algeri, S. / Barnacka, A.
AcadYr 2015-2016
DePasquale, J. / Tak, H. / Meng, X.-L. / Jones, D. / Huang, J. / Blanchard, P. / Chen, Y. & Wang, X. / Tak, H. / Mandel, K. / Jiao, X. / Wang, X. & Chen, Y. / IACHEC WG / Si, S. / Drake, J. / Stampoulis, V. / Algeri, S. / Stein, N. / Chunzhe, Z. / Andrews, J. / Vrtilek, S. / Udaltsova, I. & Stampoulis, V.
AcadYr 2016-2017
Wang, X. & Chen, Y. / Kashyap, V., Siemiginowska, A., & Zezas, A. / Stampoulis, V. / Portillo, S. / Zhang, K. / Mandel, K. / DiStefano, R. / Finkbeiner, D. & Meade, B. / Gong, R. / Shihao Y. / Zhirui, H. / Xufei, W. / Campos, L. / Tak, H. / Xufei, W. / Jones, D. / Algeri, S. / Speagle, J. / Czekala, I.
AcadYr 2017-2018
AstroStat Day / Speagle, J. / Collin, G. / McKeough, K. & Yang, S. / McKeough, K. & Campos, L. / M. Ntampaka / H. Marshall / D. Huppenkothen / X. Yu / R. DiStefano / J. Yee / H. Tak / A. Avelino
AcadYr 2018-2019
Stenning, D. / Dvorkin, C. / Sottosanti, A. / Yu, X. / Chen, Y. / Jones, D. / Lee, T.C.-M. / Tak, H. / Kashyap, V., McKeough, K., Campos, L., et al. / Baines, P. / Collin, G. / Muthukrishna, D. / Zhang, D. / Algeri, S. / Janson, L. / Ward, S. / de Beurs, Z.