Presentations 

Ana Diaz Rivero (Harvard) Sep 1 2020 Zoom 
 Flowbased Likelihoods for NonGaussian Inference
 Abstract: We investigate the use of datadriven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flowbased generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flowbased likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly nonGaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonlyused datadriven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with datadriven likelihoods such as these could be underestimating the impact of nonGaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional datadriven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
 Presentation Slides [.pdf]
 Reference: arXiv:2007.05535 [arXiv]


Herman Marshall (MIT) & Yang Chen (Michigan) Sep 8 2020 Zoom 
 Concordance: Inflight Calibration of Xray Telescopes without Absolute References
 Abstract:
We describe a process for crosscalibrating the effective areas of Xray telescopes that observe common targets. The targets are not assumed to be ``standard candles'' in the classic sense, in that the only prior placed on the source fluxes is that these fluxes have true but unknown values. Using a technique developed by Chen et al. (2019) that involves a statistical method called shrinkage, we determine effective area correction factors for each instrument that brings estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each Xray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various Xray telescopes.
 Presentation slides: Herman Marshall; Yang Chen [.pdf]
 Reference: Chen et al. 2019, JASA, 114:527, 1018
 Video [YouTube]


Katy McKeough (Harvard) Sep 29 2020 Zoom 
 Maximizing a High Dimensional Posterior Using a Genetic Algorithm
 Abstract: Astronomers are interested in delineating
boundaries of extended sources in noisy images. Analyzing the
morphology of these objects is particularly challenging for Xray
images of high redshift sources where there are a limited number
of highenergy photon counts. We apply a multiphase technique
in order to estimate the minimal boundary, the point at which the
source is no longer distinguishable from the background noise, for
complex astronomical objects. One step of this approach is to build
a posterior describing an arrangement of pixel assignments assigning
each pixel to either a region of interest or the background. In
this case of interest, we would like to find the global maximum in a
posterior space that is discrete but large. This is difficult since
the posterior evaluated at a specific pixel arrangement is very
small, leading to underflow errors in calculating the posterior
of any particular pixel assignment. Furthermore, it is difficult
to determine which pixel arrangements to optimize over since the
space is too large to explore each one is unfeasible to explore
every possibility. Genetic algorithms offer an efficient solution to
optimization in high dimensional spaces. Using genetic algorithms we
are able to explore a large amount of the relevant posterior space
and find a pixel assignment close to the global maximum.
 Presentation slides [.pdf]


Yang Chen (Michigan) Oct 20 2020 Zoom 
 Machine Learning Efforts on Solar Flare Predictions
 Abstract: In this talk, we present our machine learning efforts, which show great promise towards early predictions of solar flare events. (1) We present a data preprocessing pipeline that is built to extract useful data from multiple sources  Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) and SDO/Atmospheric Imaging Assembly (AIA)  to prepare inputs for machine learning algorithms. (2) For our strong/weak flare classification model, case studies show a significant increase in the prediction score around 20 hours before strong solar flare events, which implies that early precursors appear at least 20 hours prior to the peak of a flare event. (3) We develop a mixed Long Short Term Memory (LSTM) regression model to predict the maximum solar flare intensity within a 24hour time window. (4) Our ongoing and future work will also be briefly mentioned.
 Video [YouTube]


Aarya Patil (UToronto) Nov 17 2020 Zoom 
 Likelihoodfree Inference of Chemical Homogeneity in Open Clusters
 Abstract:
Star clusters are excellent astrophysical laboratories to study the history of star formation and chemical enrichment in our Galaxy. These are groupings of stars born out of the same gas cloud, and are theoretically expected to have similar chemical compositions. Empirically validating this chemical homogeneity is important yet difficult because the measurement of accurate and precise chemistry of stars using stellar spectroscopic data is statistically challenging. We perform highfidelity Likelihoodfree Inference of chemistry of stars using stateoftheart Neural Density Estimation to observationally determine the level of chemical homogeneity in open clusters. We make our model computationally efficient by using Functional Principal Component Analysis that models the lowdimensional intrinsic structure embedded in the ~10,000 dimensional stellar spectroscopic space. Our constraints on chemical homogeneity will not only help understand the detailed evolution of starforming clouds but also allow us to trace the chemical and dynamical history of our Galaxy through chemical tagging.
 Presentation slides [.pdf]
 Video [YouTube]


Diab Jerius (CXC/CfA) Dec 8 2020 Zoom 
 Doing the HokeyPokey
or
Deriving Statistical errors for Measurements of the Chandra Xray Observatory PSF
 Abstract: The Chandra XRay Observatory's PSF is a twodimensional wonder. It's not exactly symmetric, depends upon the astrophysical input spectrum and gets folded through instruments with various degrees of fidelity.
Still, it seems to get the job done, and some of the questions often asked are:
 What exactly does the PSF look like for my source?
 If I want to test some bit of astrophysics, what are the intrinsic errors in our knowledge of the PSF, so I can determine the sensitivity of my measurements?
 How can I simulate my observation to see if I can understand what the source looks like?
Answers to these questions are based on both models of the optics and measurements of the actual PSF.
In this talk I'll give a brief(!) overview of the optical model, introduce a simple but useful parameterization of the measured PSF (the encircled energy function), describe its use and its systematic errors, relate our attempts at deriving realistic measurement errors, and, finally, plead for your assistance in helping us refine those errors so that they are meaningful.
 Presentation slides [.pdf]
 Video [YouTube]


Xufei Wang (Harvard) Jan 5 2021 Zoom 
 Maximum Product of Spacings: A Simple Comparison with Maximum Likelihood
 Abstract: An intriguing property of the maximum product spacing method is that since the product of spacings is a pivotal quality in general, obtaining confidence intervals or performing hypothesis testing is always exact (other than numerical imprecision). However, there is a price to be paid for this exactness in terms of the width of the interval or the power of the test, in comparison with the Maximum Likelihood approach, which in general is valid only asymptotically. In this talk, we compare the two methods to illustrate these issues in the context of estimating a boundary point of a uniform distribution (where exact calculations can be done for both).
 Context: 2020 May 19, 2020 Jul 7.
 Presentation slides [.pdf]
 Video [YouTube]





