AstroStat Talks 2020-2021
Last Updated: 20210105

International CHASC AstroStatistics Centre

Topics in Astrostatistics

Statistics 310, Harvard University

AY 2020-2021


Schedule Tuesdays Noon - 2PM Eastern Time
Location Remote

Ana Diaz Rivero (Harvard)
Sep 1 2020
Flow-based Likelihoods for Non-Gaussian Inference
Abstract: We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly-used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
Presentation Slides [.pdf]
Reference: arXiv:2007.05535 [arXiv]
Herman Marshall (MIT) & Yang Chen (Michigan)
Sep 8 2020
Concordance: In-flight Calibration of X-ray Telescopes without Absolute References
Abstract: We describe a process for cross-calibrating the effective areas of X-ray telescopes that observe common targets. The targets are not assumed to be ``standard candles'' in the classic sense, in that the only prior placed on the source fluxes is that these fluxes have true but unknown values. Using a technique developed by Chen et al. (2019) that involves a statistical method called shrinkage, we determine effective area correction factors for each instrument that brings estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each X-ray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various X-ray telescopes.
Presentation slides: Herman Marshall; Yang Chen [.pdf]
Reference: Chen et al. 2019, JASA, 114:527, 1018
Video [YouTube]
Katy McKeough (Harvard)
Sep 29 2020
Maximizing a High Dimensional Posterior Using a Genetic Algorithm
Abstract: Astronomers are interested in delineating boundaries of extended sources in noisy images. Analyzing the morphology of these objects is particularly challenging for X-ray images of high redshift sources where there are a limited number of high-energy photon counts. We apply a multi-phase technique in order to estimate the minimal boundary, the point at which the source is no longer distinguishable from the background noise, for complex astronomical objects. One step of this approach is to build a posterior describing an arrangement of pixel assignments assigning each pixel to either a region of interest or the background. In this case of interest, we would like to find the global maximum in a posterior space that is discrete but large. This is difficult since the posterior evaluated at a specific pixel arrangement is very small, leading to underflow errors in calculating the posterior of any particular pixel assignment. Furthermore, it is difficult to determine which pixel arrangements to optimize over since the space is too large to explore each one is unfeasible to explore every possibility. Genetic algorithms offer an efficient solution to optimization in high dimensional spaces. Using genetic algorithms we are able to explore a large amount of the relevant posterior space and find a pixel assignment close to the global maximum.
Presentation slides [.pdf]
Yang Chen (Michigan)
Oct 20 2020
Machine Learning Efforts on Solar Flare Predictions
Abstract: In this talk, we present our machine learning efforts, which show great promise towards early predictions of solar flare events. (1) We present a data pre-processing pipeline that is built to extract useful data from multiple sources -- Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) and SDO/Atmospheric Imaging Assembly (AIA) -- to prepare inputs for machine learning algorithms. (2) For our strong/weak flare classification model, case studies show a significant increase in the prediction score around 20 hours before strong solar flare events, which implies that early precursors appear at least 20 hours prior to the peak of a flare event. (3) We develop a mixed Long Short Term Memory (LSTM) regression model to predict the maximum solar flare intensity within a 24-hour time window. (4) Our ongoing and future work will also be briefly mentioned.
Video [YouTube]
Aarya Patil (UToronto)
Nov 17 2020
Likelihood-free Inference of Chemical Homogeneity in Open Clusters
Abstract: Star clusters are excellent astrophysical laboratories to study the history of star formation and chemical enrichment in our Galaxy. These are groupings of stars born out of the same gas cloud, and are theoretically expected to have similar chemical compositions. Empirically validating this chemical homogeneity is important yet difficult because the measurement of accurate and precise chemistry of stars using stellar spectroscopic data is statistically challenging. We perform high-fidelity Likelihood-free Inference of chemistry of stars using state-of-the-art Neural Density Estimation to observationally determine the level of chemical homogeneity in open clusters. We make our model computationally efficient by using Functional Principal Component Analysis that models the low-dimensional intrinsic structure embedded in the ~10,000 dimensional stellar spectroscopic space. Our constraints on chemical homogeneity will not only help understand the detailed evolution of star-forming clouds but also allow us to trace the chemical and dynamical history of our Galaxy through chemical tagging.
Presentation slides [.pdf]
Video [YouTube]
Diab Jerius (CXC/CfA)
Dec 8 2020
Doing the Hokey-Pokey
Deriving Statistical errors for Measurements of the Chandra X-ray Observatory PSF
Abstract: The Chandra X-Ray Observatory's PSF is a two-dimensional wonder. It's not exactly symmetric, depends upon the astrophysical input spectrum and gets folded through instruments with various degrees of fidelity.
Still, it seems to get the job done, and some of the questions often asked are:
  • What exactly does the PSF look like for my source?
  • If I want to test some bit of astrophysics, what are the intrinsic errors in our knowledge of the PSF, so I can determine the sensitivity of my measurements?
  • How can I simulate my observation to see if I can understand what the source looks like?
Answers to these questions are based on both models of the optics and measurements of the actual PSF.
In this talk I'll give a brief(!) overview of the optical model, introduce a simple but useful parameterization of the measured PSF (the encircled energy function), describe its use and its systematic errors, relate our attempts at deriving realistic measurement errors, and, finally, plead for your assistance in helping us refine those errors so that they are meaningful.
Presentation slides [.pdf]
Video [YouTube]
Xufei Wang (Harvard)
Jan 5 2021
Maximum Product of Spacings: A Simple Comparison with Maximum Likelihood
Abstract: An intriguing property of the maximum product spacing method is that since the product of spacings is a pivotal quality in general, obtaining confidence intervals or performing hypothesis testing is always exact (other than numerical imprecision). However, there is a price to be paid for this exactness in terms of the width of the interval or the power of the test, in comparison with the Maximum Likelihood approach, which in general is valid only asymptotically. In this talk, we compare the two methods to illustrate these issues in the context of estimating a boundary point of a uniform distribution (where exact calculations can be done for both).
Context: 2020 May 19, 2020 Jul 7.
Presentation slides [.pdf]
Video [YouTube]

Fall/Winter 2004-2005
Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D.
Fall/Winter 2005-2006
van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.-L.
Fall/Winter 2006-2007
Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.
Fall/Winter 2007-2008
Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.
Fall/Winter 2008-2009
H. Lee / A. Connors, B. Kelly, & P. Protopapas / P. Baines / A. Blocker / J. Hong / H. Chernoff / Z. Li / L. Zhu (Feb) / A. Connors (Pt.1) / A. Connors (Pt.2) / L. Zhu (Mar) / E. Kolaczyk / V. Liublinska / N. Stein
Fall/Winter 2009-2010
A.Connors / B.Kelly / N.Stein, P.Baines / D.Stenning / J. Xu / A.Blocker / P.Baines, Y.Yu / V.Liublinska, J.Xu, J.Liu / Meng X.L., et al. / A. Blocker, et al. / A. Siemiginowska / D. Richard / A. Blocker / Xie X. / Xu J. / V. Liublinska / L. Jing
AcadYr 2010-2011
Astrostat Haiku / P. Protopapas / A. Zezas & V. Kashyap / A. Siemiginowska / K. Mandel / N. Stein / A. Mahabal / Hong J.S. / D. Stenning / A. Diaferio / Xu J. / B. Kelly / P. Baines & I. Udaltsova / M. Weber
AcadYr 2011-2012
A. Blocker / Astro for Stat / B. Kelly / R. D'Abrusco / E. Turner / Xu J. / T. Loredo / A. Blocker / P. Baines / A. Zezas et al. / Min S. & Xu J. / O. Papaspiliopoulos / Wang L. / T. Laskar
AcadYr 2012-2013
N. Stein / A. Siemiginowska / D. Cervone / R. Dawson / P. Protopapas / K. Reeves / Xu J. / J. Scargle / Min S. / Wang L. & D. Jones / J. Steiner / B. Kelly / K. McKeough
AcadYr 2013-2014
Meng X.-L. / Meng X.-L., K. Mandel / A. Siemiginowska / S. Vrtilek & L. Bornn / Lazhi W. / D. Jones / R. Wong / Xu J. / van Dyk D. / Feigelson E. / Gopalan G. / Min S. / Smith R. / Zezas A. / van Dyk D. / Hyungsuk T. / Czerny, B. / Jones D. / Liu K. / Zezas A.
AcadYr 2014-2015
Vegetabile, B. & Aldcroft, T., / H. Jae Sub / Siemiginowska, A. & Kashyap, V. / Pankratius, V. / Tak, H. / Brenneman, L. / Johnson, J. / Lynch, R.C. / Fan, M.J. / Meng, X.-L. / Gopalan, G. / Jiao, X. / Si, S. / Udaltsova, I. & Zezas, A. / Wang, L. / Tak, H. / Eadie, G. / Czekala, I. / Stenning, D. / Stampoulis, V. / Aitkin, M. / Algeri, S. / Barnacka, A.
AcadYr 2015-2016
DePasquale, J. / Tak, H. / Meng, X.-L. / Jones, D. / Huang, J. / Blanchard, P. / Chen, Y. & Wang, X. / Tak, H. / Mandel, K. / Jiao, X. / Wang, X. & Chen, Y. / IACHEC WG / Si, S. / Drake, J. / Stampoulis, V. / Algeri, S. / Stein, N. / Chunzhe, Z. / Andrews, J. / Vrtilek, S. / Udaltsova, I. & Stampoulis, V.
AcadYr 2016-2017
Wang, X. & Chen, Y. / Kashyap, V., Siemiginowska, A., & Zezas, A. / Stampoulis, V. / Portillo, S. / Zhang, K. / Mandel, K. / DiStefano, R. / Finkbeiner, D. & Meade, B. / Gong, R. / Shihao Y. / Zhirui, H. / Xufei, W. / Campos, L. / Tak, H. / Xufei, W. / Jones, D. / Algeri, S. / Speagle, J. / Czekala, I.
AcadYr 2017-2018
AstroStat Day / Speagle, J. / Collin, G. / McKeough, K. & Yang, S. / McKeough, K. & Campos, L. / M. Ntampaka / H. Marshall / D. Huppenkothen / X. Yu / R. DiStefano / J. Yee / H. Tak / A. Avelino
AcadYr 2018-2019
Stenning, D. / Dvorkin, C. / Sottosanti, A. / Yu, X. / Chen, Y. / Jones, D. / Lee, T.C.-M. / Tak, H. / Kashyap, V., McKeough, K., Campos, L., et al. / Baines, P. / Collin, G. / Muthukrishna, D. / Zhang, D. / Algeri, S. / Janson, L. / Ward, S. / de Beurs, Z.
AcadYr 2019-2020
McKeough, K. / Astudillo, J. & Protopapas, P. / Zezas, A. / Speagle, J. / Meng, X.-L., Siemiginowska, A., & Kashyap, V. / Bonfini, P. / Liu, C. / Guenther, H. / Castrillon, J. / McKeough, K. / Broekgaarden, F. / Autenrieth, M. / Motta, G. / Zucker, C. / Tak, H. / Kashyap, V. & Wang, X. / Wang, J. / Wang, X. & Ingram, J.
AcadYr 2020-2021
Diaz Rivero, A. / Marshall, H. & Chen, Y. / McKeough, K. / Chen, Y. / Patil, A. / Jerius, D. / Wang, X.