AstroStat Talks 2020-2021
Last Updated: 20210415

International CHASC AstroStatistics Centre

Topics in Astrostatistics

Statistics 310, Harvard University

AY 2020-2021


Schedule Tuesdays Noon - 2PM Eastern Time
Location Remote

Job Announcement:
We seek to hire a postdoc to work on astrostatistics problems in high-energy astrophysics. See details and instructions at the AAS Job Register, Applications are welcome from qualified candidates.

Ana Diaz Rivero (Harvard)
Sep 1 2020
Flow-based Likelihoods for Non-Gaussian Inference
Abstract: We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly-used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
Presentation Slides [.pdf]
Reference: arXiv:2007.05535 [arXiv]
Herman Marshall (MIT) & Yang Chen (Michigan)
Sep 8 2020
Concordance: In-flight Calibration of X-ray Telescopes without Absolute References
Abstract: We describe a process for cross-calibrating the effective areas of X-ray telescopes that observe common targets. The targets are not assumed to be ``standard candles'' in the classic sense, in that the only prior placed on the source fluxes is that these fluxes have true but unknown values. Using a technique developed by Chen et al. (2019) that involves a statistical method called shrinkage, we determine effective area correction factors for each instrument that brings estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each X-ray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various X-ray telescopes.
Presentation slides: Herman Marshall; Yang Chen [.pdf]
Reference: Chen et al. 2019, JASA, 114:527, 1018
Video [!yt]
Katy McKeough (Harvard)
Sep 29 2020
Maximizing a High Dimensional Posterior Using a Genetic Algorithm
Abstract: Astronomers are interested in delineating boundaries of extended sources in noisy images. Analyzing the morphology of these objects is particularly challenging for X-ray images of high redshift sources where there are a limited number of high-energy photon counts. We apply a multi-phase technique in order to estimate the minimal boundary, the point at which the source is no longer distinguishable from the background noise, for complex astronomical objects. One step of this approach is to build a posterior describing an arrangement of pixel assignments assigning each pixel to either a region of interest or the background. In this case of interest, we would like to find the global maximum in a posterior space that is discrete but large. This is difficult since the posterior evaluated at a specific pixel arrangement is very small, leading to underflow errors in calculating the posterior of any particular pixel assignment. Furthermore, it is difficult to determine which pixel arrangements to optimize over since the space is too large to explore each one is unfeasible to explore every possibility. Genetic algorithms offer an efficient solution to optimization in high dimensional spaces. Using genetic algorithms we are able to explore a large amount of the relevant posterior space and find a pixel assignment close to the global maximum.
Presentation slides [.pdf]
Yang Chen (Michigan)
Oct 20 2020
Machine Learning Efforts on Solar Flare Predictions
Abstract: In this talk, we present our machine learning efforts, which show great promise towards early predictions of solar flare events. (1) We present a data pre-processing pipeline that is built to extract useful data from multiple sources -- Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) and SDO/Atmospheric Imaging Assembly (AIA) -- to prepare inputs for machine learning algorithms. (2) For our strong/weak flare classification model, case studies show a significant increase in the prediction score around 20 hours before strong solar flare events, which implies that early precursors appear at least 20 hours prior to the peak of a flare event. (3) We develop a mixed Long Short Term Memory (LSTM) regression model to predict the maximum solar flare intensity within a 24-hour time window. (4) Our ongoing and future work will also be briefly mentioned.
Video [!yt]
Aarya Patil (UToronto)
Nov 17 2020
Likelihood-free Inference of Chemical Homogeneity in Open Clusters
Abstract: Star clusters are excellent astrophysical laboratories to study the history of star formation and chemical enrichment in our Galaxy. These are groupings of stars born out of the same gas cloud, and are theoretically expected to have similar chemical compositions. Empirically validating this chemical homogeneity is important yet difficult because the measurement of accurate and precise chemistry of stars using stellar spectroscopic data is statistically challenging. We perform high-fidelity Likelihood-free Inference of chemistry of stars using state-of-the-art Neural Density Estimation to observationally determine the level of chemical homogeneity in open clusters. We make our model computationally efficient by using Functional Principal Component Analysis that models the low-dimensional intrinsic structure embedded in the ~10,000 dimensional stellar spectroscopic space. Our constraints on chemical homogeneity will not only help understand the detailed evolution of star-forming clouds but also allow us to trace the chemical and dynamical history of our Galaxy through chemical tagging.
Presentation slides [.pdf]
Video [!yt]
Diab Jerius (CXC/CfA)
Dec 8 2020
Doing the Hokey-Pokey
Deriving Statistical errors for Measurements of the Chandra X-ray Observatory PSF
Abstract: The Chandra X-Ray Observatory's PSF is a two-dimensional wonder. It's not exactly symmetric, depends upon the astrophysical input spectrum and gets folded through instruments with various degrees of fidelity.
Still, it seems to get the job done, and some of the questions often asked are:
  • What exactly does the PSF look like for my source?
  • If I want to test some bit of astrophysics, what are the intrinsic errors in our knowledge of the PSF, so I can determine the sensitivity of my measurements?
  • How can I simulate my observation to see if I can understand what the source looks like?
Answers to these questions are based on both models of the optics and measurements of the actual PSF.
In this talk I'll give a brief(!) overview of the optical model, introduce a simple but useful parameterization of the measured PSF (the encircled energy function), describe its use and its systematic errors, relate our attempts at deriving realistic measurement errors, and, finally, plead for your assistance in helping us refine those errors so that they are meaningful.
Presentation slides [.pdf]
Video [!yt]
Xufei Wang (Harvard)
Jan 5 2021
Maximum Product of Spacings: A Simple Comparison with Maximum Likelihood
Abstract: An intriguing property of the maximum product spacing method is that since the product of spacings is a pivotal quality in general, obtaining confidence intervals or performing hypothesis testing is always exact (other than numerical imprecision). However, there is a price to be paid for this exactness in terms of the width of the interval or the power of the test, in comparison with the Maximum Likelihood approach, which in general is valid only asymptotically. In this talk, we compare the two methods to illustrate these issues in the context of estimating a boundary point of a uniform distribution (where exact calculations can be done for both).
Context: 2020 May 19, 2020 Jul 7.
Presentation slides [.pdf]
Video [!yt]
Aneta Siemiginowska (CfA)
Jan 26 2021
Discussion of New Astrostatistics Projects for Students
We plan to have a discussion about astrostatistics projects. The meeting will be geared towards the new students and provide the background introduction.
Presentation slides [.pdf]
Video [!yt]
Cong Xu (UC Davis)
Feb 9 2021
9am PST
Change point detection and image segmentation for time series of astrophysical images
Abstract: In this talk, we present a method for modeling a time series of astronomical images. The method assumes that at each time point, the corresponding multi-band image stack is an unknown 3D piecewise constant function including Poisson noise. It also assumes that all image stacks between any two adjacent change points (in time domain) share the same unknown piecewise constant function. The proposed method is designed to estimate the number and the locations of all the change points (in time domain), as well as all the unknown piecewise constant functions between any pairs of the change points. A practical algorithm is also developed to solve the corresponding complicated optimization problem. Applications to two real datasets, the XMM observation of a flaring star and an emerging solar coronal loop, illustrate the usage of the proposed method and the scientific insight gained from it.
See also:
     Xu et al. 2021 [arXiv]
     Wong et al. 2015 [Automark]
Presentation slides [.pdf]
Video [!yt]
Adrien Picquenot (Université Paris-Saclay)
Feb 22
6pm CET
Introduction and application of a new blind source separation method for extended sources in X-ray astronomy
Abstract: Some extended sources, among which we find the supernovae remnants, present an outstanding diversity of morphologies that the current generation of spectro-imaging telescopes can detect with an unprecedented level of details. However, the data analysis tools currently in use in the high energy astrophysics community fail to take full advantage of these data : most of them only focus on the spectral information without using the many spatial specificities or the correlation between the spectral and spatial dimensions. For that reason, the physical parameters that are retrieved are often widely contaminated by other components. In this talk, we will explore a new blind source separation method exploiting fully both spatial and spectral information with X-ray data, and their correlations. We will introduce the mathematical concepts on which the algorithm rely, and present some current physical applications and future studies that it could benefit.
Presentation slides [.pdf] (20 MB)
Video [!yt]
Taylor Jacovich (CfA)
Mar 09
Noon EST
Exploring The Parameter Space of High Energy Stellar Explosions
Abstract: Understanding the outflows of stellar explosions such as supernova remnants and gamma-ray burst afterglows requires producing a physical description of the underlying parameter space. As part of this, we must first construct computational models of the outflows. We begin by discussing GRB afterglow modeling using a modified version of the scale-free hydrodynamic fitting routine boxfit. We discuss the impact our modifications had on the original boxfit model, and how these modifications change the behavior of generated lightcurves.
We then work to model the bulk properties of the remnants of core-collapse supernovae by constructing a software pipeline to follow cradle-to-grave massive star evolution beginning with the MESA stellar evolution code. We couple MESA to the Supernova Evolution Code (SNEC) to explode the star and follow the evolution of the ejecta with the cosmic ray hydrodynamics (ChN) code out to 7000 years post core-collapse. We consider the effects of intra-remnant absorption on the observed emission from young supernova remnants and discuss the challenges in producing a grid of self-consistent models capable of representing the parameter space.
We conclude by considering current and possible future methods for properly fitting broadband observations to the modeled parameter space, noting the role these fits will play in interpreting data from future un-targeted surveys. In the case of the GRBs, we consider the implications population fits will have on understanding GRBs as a class of objects. In the case of the SNR, we consider potential observables and how they may help disentangle the degenerate parameter space of progenitor objects for core-collapse remnants.
Presentation slides: [.pdf] ; [gDoc]
Video [!yt]
Alex Geringer-Sameth (Imperial)
Mar 12
1530 GMT
Source discovery with a deeper characterization of the high-energy diffuse background
Abstract: Deep, wide-area gamma-ray observations have set the stage for the potential detection of faint signals from important astrophysical targets. I will discuss a few statistical questions that arise when trying to establish the existence of a signal and identify it as the phenomenon of interest. This includes assessing significance given our limited understanding of the high-energy diffuse background, especially regarding populations of faint sources. Poisson statistics fail to describe the scene, and this severely lowers the power of conventional source detection. I will present ongoing work on an empirical characterization of the background which is used to define a new source detection framework. The method maintains high sensitivity, correctly includes background sources below catalog thresholds, and includes partial knowledge about the target signal. I will present its application to the search for dark matter signals in dwarf galaxies, which will become essential with the next generation of sky surveys. These statistical ideas can also be used to probe more deeply into the physical origin of the diffuse background itself.
Panos Toulis (University of Chicago)
Mar 23
11am CDT
Randomization Inference of Periodicity in Unequally Spaced Time Series with Application to Exoplanet Detection
Abstract: The estimation of periodicity is a fundamental task in many scientific studies. Existing methods rely on assumptions that the observation times have equal or i.i.d. spacings, and that common estimators, such as the periodogram peak, are consistent and asymptotically normal. In practice, however, these assumptions are unrealistic as observation times usually exhibit deterministic patterns ---e.g., the nightly observation cycle in astronomy--- that imprint nuisance periodicities in the data. These nuisance signals also affect the finite-sample distribution of estimators, which can substantially deviate from normality. Here, we propose a set identification method, fusing ideas from randomization inference and partial identification. In particular, we develop a sharp test for any periodicity value, and then invert the test to build a confidence set. This approach is appropriate because the construction of confidence sets does not rely on assumptions of regular or well-behaved asymptotics. Notably, our inference is valid in finite samples when our method is fully implemented, while it can be asymptotically valid under an approximate implementation designed to ease computation. Empirically, our method is validated in exoplanet detection using radial velocity data. In this context, our method correctly identifies the periodicity of the confirmed exoplanets in our sample. For some other, yet unconfirmed detections, we show that the statistical evidence is particularly weak, which illustrates the failure of traditional statistical techniques. Last but not least, our method offers a constructive way to resolve these identification issues via improved observation designs. In exoplanet detection, these designs suggest meaningful improvements in identifying periodicity even when a moderate amount of randomization is introduced in scheduling radial velocity measurements.
Working paper ; Supplement [url/.pdf]
Presentation slides [.pdf]
Video [!yt]
Axel Donath (Heidelberg)
Apr 7
1630 CEST
Analysis methods and challenges in TeV gamma-ray astronomy
Abstract: TeV gamma-ray astronomy is still a rather young field of research. However in the past two decades, observation programs such as the H.E.S.S. Galactic Plane Survey (HGPS), have shown that Galactic TeV gamma-ray emission is not a rare phenomenon, but with ~80 detected sources, the Milky Way shines bright in TeV gamma-rays.
The limited angular resolution and energy range of the current TeV instruments together with the high density of sources in the Galactic plane, underlying interstellar diffuse emission and low statistics in general, imposes many challenges on the analysis of the gamma-ray data and creation of source catalogs from it. Classical TeV gamma-ray astronomy analysis methods are typically limited to spectral (1D) and image based (2D) measurements and are neither sufficient to fully exploit data from existing instruments, nor data of the upcoming Cherenkov Telescope Array (CTA) which will survey the Galactic plane with ~10 times improved sensitivity.
In this talk I will briefly present some results and lessons we have learned from the analysis of the HGPS dataset as well as introduce the Gammapy project. Gammapy is an openly developed Python package for gamma-ray astronomy and a prototype for the CTA science tools. It allows for combined spectro-morphological (3D) and time dependent parametric modelling of gamma-ray data using binned Poisson maximum likelihood fitting. Using FITS based input data formats and higher level likelihood interfaces, it also allows to combine data from multiple GeV and TeV instruments and thus extend the energy range and improve statistics of measurements of gamma-ray data in general.
Presentation slides [.pdf] (23 MB)
Video [!yt]
Tulun Ergin (TUBITAK Uzay)
Apr 8
1730 EEST
Learning to Sample and Classify Supernova Remnants from Datasets Collected by Current & Future Gamma-ray Observatories
Abstract: Supernova remnants (SNRs) are key gamma-ray sources to investigate the cosmic ray (CR) acceleration and escape mechanisms. So, it is crucial to study the gamma-ray production models in SNRs and to disentangle the unique interstellar medium that SNRs expand into. However, the number of detected and resolved SNRs has to increase so that we can have a more realistic classification scheme for SNRs and obtain a better theoretical insight into CR phenomena happening within and around SNRs. More powerful gamma-ray observatories, such as Cherenkov Telescope Array, with better angular resolution and higher sensitivity in detecting and resolving gamma-ray sources are being constructed and they will produce big volumes of data. So, we have to come up with effective methods to handle these data by implementing smart algorithms (e.g. machine learning - ML). In the first part of my talk, I will discuss the ML tools that I plan to use for sampling artificial SNRs via a customised Normalising Flow model and using these artificial samples to train a classifier for recognising SNRs in big data sets. In the second part, I will talk about the spectral modelling of these objects, which are classified as SNRs, using multi-wavelength data.
Presentation slides [.pdf] (25 MB)
Video [!yt]
Rebecca Phillipson (University of Washington)
Apr 20
9am PDT
Investigating Nonlinear and Stochastic Variability of Accreting Compact Objects via Recurrence Analysis
Abstract: We investigate the use of nonlinear time series analysis techniques for variability studies of accreting compact objects. The onset of new time domain surveys and the imminent increase in astronomical timing data expose the shortcomings in traditional time series analysis (such as power spectra analysis) in characterizing the abundantly varied, complex and stochastic time variability of accreting sources, such as X-ray binaries (XRBs) and Active Galactic Nuclei (AGN). Recent applications of alternative methods to Fourier analysis used in other disciplines have shown promise in characterizing higher modes of variability and timescales in AGN and XRBs alike. In particular, methods from nonlinear dynamics utilize the projection of one-dimensional time series into higher order phase spaces, enabling a characterization of trajectories in the phase space representative of the underlying dynamics which generate the original time series. Recurrence analysis in particular is a useful geometric and statistical tool for distinguishing between deterministic and stochastic behavior in time series, in addition to probing timescales and classes of variability.
Hu Sun (University of Michigan)
Apr 27
Noon EDT Zoom
Video imputation and prediction models in context of space weather monitoring

Fall/Winter 2004-2005
Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D.
Fall/Winter 2005-2006
van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.-L.
Fall/Winter 2006-2007
Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.
Fall/Winter 2007-2008
Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.
Fall/Winter 2008-2009
H. Lee / A. Connors, B. Kelly, & P. Protopapas / P. Baines / A. Blocker / J. Hong / H. Chernoff / Z. Li / L. Zhu (Feb) / A. Connors (Pt.1) / A. Connors (Pt.2) / L. Zhu (Mar) / E. Kolaczyk / V. Liublinska / N. Stein
Fall/Winter 2009-2010
A.Connors / B.Kelly / N.Stein, P.Baines / D.Stenning / J. Xu / A.Blocker / P.Baines, Y.Yu / V.Liublinska, J.Xu, J.Liu / Meng X.L., et al. / A. Blocker, et al. / A. Siemiginowska / D. Richard / A. Blocker / Xie X. / Xu J. / V. Liublinska / L. Jing
AcadYr 2010-2011
Astrostat Haiku / P. Protopapas / A. Zezas & V. Kashyap / A. Siemiginowska / K. Mandel / N. Stein / A. Mahabal / Hong J.S. / D. Stenning / A. Diaferio / Xu J. / B. Kelly / P. Baines & I. Udaltsova / M. Weber
AcadYr 2011-2012
A. Blocker / Astro for Stat / B. Kelly / R. D'Abrusco / E. Turner / Xu J. / T. Loredo / A. Blocker / P. Baines / A. Zezas et al. / Min S. & Xu J. / O. Papaspiliopoulos / Wang L. / T. Laskar
AcadYr 2012-2013
N. Stein / A. Siemiginowska / D. Cervone / R. Dawson / P. Protopapas / K. Reeves / Xu J. / J. Scargle / Min S. / Wang L. & D. Jones / J. Steiner / B. Kelly / K. McKeough
AcadYr 2013-2014
Meng X.-L. / Meng X.-L., K. Mandel / A. Siemiginowska / S. Vrtilek & L. Bornn / Lazhi W. / D. Jones / R. Wong / Xu J. / van Dyk D. / Feigelson E. / Gopalan G. / Min S. / Smith R. / Zezas A. / van Dyk D. / Hyungsuk T. / Czerny, B. / Jones D. / Liu K. / Zezas A.
AcadYr 2014-2015
Vegetabile, B. & Aldcroft, T., / H. Jae Sub / Siemiginowska, A. & Kashyap, V. / Pankratius, V. / Tak, H. / Brenneman, L. / Johnson, J. / Lynch, R.C. / Fan, M.J. / Meng, X.-L. / Gopalan, G. / Jiao, X. / Si, S. / Udaltsova, I. & Zezas, A. / Wang, L. / Tak, H. / Eadie, G. / Czekala, I. / Stenning, D. / Stampoulis, V. / Aitkin, M. / Algeri, S. / Barnacka, A.
AcadYr 2015-2016
DePasquale, J. / Tak, H. / Meng, X.-L. / Jones, D. / Huang, J. / Blanchard, P. / Chen, Y. & Wang, X. / Tak, H. / Mandel, K. / Jiao, X. / Wang, X. & Chen, Y. / IACHEC WG / Si, S. / Drake, J. / Stampoulis, V. / Algeri, S. / Stein, N. / Chunzhe, Z. / Andrews, J. / Vrtilek, S. / Udaltsova, I. & Stampoulis, V.
AcadYr 2016-2017
Wang, X. & Chen, Y. / Kashyap, V., Siemiginowska, A., & Zezas, A. / Stampoulis, V. / Portillo, S. / Zhang, K. / Mandel, K. / DiStefano, R. / Finkbeiner, D. & Meade, B. / Gong, R. / Shihao Y. / Zhirui, H. / Xufei, W. / Campos, L. / Tak, H. / Xufei, W. / Jones, D. / Algeri, S. / Speagle, J. / Czekala, I.
AcadYr 2017-2018
AstroStat Day / Speagle, J. / Collin, G. / McKeough, K. & Yang, S. / McKeough, K. & Campos, L. / M. Ntampaka / H. Marshall / D. Huppenkothen / X. Yu / R. DiStefano / J. Yee / H. Tak / A. Avelino
AcadYr 2018-2019
Stenning, D. / Dvorkin, C. / Sottosanti, A. / Yu, X. / Chen, Y. / Jones, D. / Lee, T.C.-M. / Tak, H. / Kashyap, V., McKeough, K., Campos, L., et al. / Baines, P. / Collin, G. / Muthukrishna, D. / Zhang, D. / Algeri, S. / Janson, L. / Ward, S. / de Beurs, Z.
AcadYr 2019-2020
McKeough, K. / Astudillo, J. & Protopapas, P. / Zezas, A. / Speagle, J. / Meng, X.-L., Siemiginowska, A., & Kashyap, V. / Bonfini, P. / Liu, C. / Guenther, H. / Castrillon, J. / McKeough, K. / Broekgaarden, F. / Autenrieth, M. / Motta, G. / Zucker, C. / Tak, H. / Kashyap, V. & Wang, X. / Wang, J. / Wang, X. & Ingram, J.
AcadYr 2020-2021
Diaz Rivero, A. / Marshall, H. & Chen, Y. / McKeough, K. / Chen, Y. / Patil, A. / Jerius, D. / Wang, X. / Siemiginowska, A. / Xu, C. / Picquenot, A. / Jacovich, T. / Geringer-Sameth, A. / Toulis, P. / Donath, A. / Ergin, T. / Phillipson, R. / Sun, H.