Presentations 

Ana Diaz Rivero (Harvard) Sep 1 2020 Zoom 
 Flowbased Likelihoods for NonGaussian Inference
 Abstract: We investigate the use of datadriven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flowbased generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flowbased likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly nonGaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonlyused datadriven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with datadriven likelihoods such as these could be underestimating the impact of nonGaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional datadriven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
 Presentation Slides [.pdf]
 Reference: arXiv:2007.05535 [arXiv]


Herman Marshall (MIT) & Yang Chen (Michigan) Sep 8 2020 Zoom 
 Concordance: Inflight Calibration of Xray Telescopes without Absolute References
 Abstract:
We describe a process for crosscalibrating the effective areas of Xray telescopes that observe common targets. The targets are not assumed to be ``standard candles'' in the classic sense, in that the only prior placed on the source fluxes is that these fluxes have true but unknown values. Using a technique developed by Chen et al. (2019) that involves a statistical method called shrinkage, we determine effective area correction factors for each instrument that brings estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each Xray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various Xray telescopes.
 Presentation slides: Herman Marshall; Yang Chen [.pdf]
 Reference: Chen et al. 2019, JASA, 114:527, 1018
 Video [!yt]


Katy McKeough (Harvard) Sep 29 2020 Zoom 
 Maximizing a High Dimensional Posterior Using a Genetic Algorithm
 Abstract: Astronomers are interested in delineating
boundaries of extended sources in noisy images. Analyzing the
morphology of these objects is particularly challenging for Xray
images of high redshift sources where there are a limited number
of highenergy photon counts. We apply a multiphase technique
in order to estimate the minimal boundary, the point at which the
source is no longer distinguishable from the background noise, for
complex astronomical objects. One step of this approach is to build
a posterior describing an arrangement of pixel assignments assigning
each pixel to either a region of interest or the background. In
this case of interest, we would like to find the global maximum in a
posterior space that is discrete but large. This is difficult since
the posterior evaluated at a specific pixel arrangement is very
small, leading to underflow errors in calculating the posterior
of any particular pixel assignment. Furthermore, it is difficult
to determine which pixel arrangements to optimize over since the
space is too large to explore each one is unfeasible to explore
every possibility. Genetic algorithms offer an efficient solution to
optimization in high dimensional spaces. Using genetic algorithms we
are able to explore a large amount of the relevant posterior space
and find a pixel assignment close to the global maximum.
 Presentation slides [.pdf]


Yang Chen (Michigan) Oct 20 2020 Zoom 
 Machine Learning Efforts on Solar Flare Predictions
 Abstract: In this talk, we present our machine learning efforts, which show great promise towards early predictions of solar flare events. (1) We present a data preprocessing pipeline that is built to extract useful data from multiple sources  Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) and SDO/Atmospheric Imaging Assembly (AIA)  to prepare inputs for machine learning algorithms. (2) For our strong/weak flare classification model, case studies show a significant increase in the prediction score around 20 hours before strong solar flare events, which implies that early precursors appear at least 20 hours prior to the peak of a flare event. (3) We develop a mixed Long Short Term Memory (LSTM) regression model to predict the maximum solar flare intensity within a 24hour time window. (4) Our ongoing and future work will also be briefly mentioned.
 Video [!yt]


Aarya Patil (UToronto) Nov 17 2020 Zoom 
 Likelihoodfree Inference of Chemical Homogeneity in Open Clusters
 Abstract:
Star clusters are excellent astrophysical laboratories to study the history of star formation and chemical enrichment in our Galaxy. These are groupings of stars born out of the same gas cloud, and are theoretically expected to have similar chemical compositions. Empirically validating this chemical homogeneity is important yet difficult because the measurement of accurate and precise chemistry of stars using stellar spectroscopic data is statistically challenging. We perform highfidelity Likelihoodfree Inference of chemistry of stars using stateoftheart Neural Density Estimation to observationally determine the level of chemical homogeneity in open clusters. We make our model computationally efficient by using Functional Principal Component Analysis that models the lowdimensional intrinsic structure embedded in the ~10,000 dimensional stellar spectroscopic space. Our constraints on chemical homogeneity will not only help understand the detailed evolution of starforming clouds but also allow us to trace the chemical and dynamical history of our Galaxy through chemical tagging.
 Presentation slides [.pdf]
 Video [!yt]


Diab Jerius (CXC/CfA) Dec 8 2020 Zoom 
 Doing the HokeyPokey
or
Deriving Statistical errors for Measurements of the Chandra Xray Observatory PSF
 Abstract: The Chandra XRay Observatory's PSF is a twodimensional wonder. It's not exactly symmetric, depends upon the astrophysical input spectrum and gets folded through instruments with various degrees of fidelity.
Still, it seems to get the job done, and some of the questions often asked are:
 What exactly does the PSF look like for my source?
 If I want to test some bit of astrophysics, what are the intrinsic errors in our knowledge of the PSF, so I can determine the sensitivity of my measurements?
 How can I simulate my observation to see if I can understand what the source looks like?
Answers to these questions are based on both models of the optics and measurements of the actual PSF.
In this talk I'll give a brief(!) overview of the optical model, introduce a simple but useful parameterization of the measured PSF (the encircled energy function), describe its use and its systematic errors, relate our attempts at deriving realistic measurement errors, and, finally, plead for your assistance in helping us refine those errors so that they are meaningful.
 Presentation slides [.pdf]
 Video [!yt]


Xufei Wang (Harvard) Jan 5 2021 Zoom 
 Maximum Product of Spacings: A Simple Comparison with Maximum Likelihood
 Abstract: An intriguing property of the maximum product spacing method is that since the product of spacings is a pivotal quality in general, obtaining confidence intervals or performing hypothesis testing is always exact (other than numerical imprecision). However, there is a price to be paid for this exactness in terms of the width of the interval or the power of the test, in comparison with the Maximum Likelihood approach, which in general is valid only asymptotically. In this talk, we compare the two methods to illustrate these issues in the context of estimating a boundary point of a uniform distribution (where exact calculations can be done for both).
 Context: 2020 May 19, 2020 Jul 7.
 Presentation slides [.pdf]
 Video [!yt]


Aneta Siemiginowska (CfA) Jan 26 2021 Zoom 
 Discussion of New Astrostatistics Projects for Students
 We plan to have a discussion about astrostatistics projects. The meeting will be geared towards the new students and provide the background introduction.
 Presentation slides [.pdf]
 Video [!yt]


Cong Xu (UC Davis) Feb 9 2021 9am PST Zoom 
 Change point detection and image segmentation for time series of astrophysical images
 Abstract: In this talk, we present a method for modeling a time series of astronomical images. The method assumes that at each time point, the corresponding multiband image stack is an unknown 3D piecewise constant function including Poisson noise. It also assumes that all image stacks between any two adjacent change points (in time domain) share the same unknown piecewise constant function. The proposed method is designed to estimate the number and the locations of all the change points (in time domain), as well as all the unknown piecewise constant functions between any pairs of the change points. A practical algorithm is also developed to solve the corresponding complicated optimization problem. Applications to two real datasets, the XMM observation of a flaring star and an emerging solar coronal loop, illustrate the usage of the proposed method and the scientific insight gained from it.
 See also:
Xu et al. 2021 [arXiv] github.com/kevinxucong/4D_Automark Wong et al. 2015 [Automark]
 Presentation slides [.pdf]
 Video [!yt]


Adrien Picquenot (Université ParisSaclay) Feb 22 6pm CET Zoom 
 Introduction and application of a new blind source separation method for extended sources in Xray astronomy
 Abstract:
Some extended sources, among which we find the supernovae remnants, present an outstanding diversity of morphologies that the current generation of spectroimaging telescopes can detect with an unprecedented level of details. However, the data analysis tools currently in use in the high energy astrophysics community fail to take full advantage of these data : most of them only focus on the spectral information without using the many spatial specificities or the correlation between the spectral and spatial dimensions. For that reason, the physical parameters that are retrieved are often widely contaminated by other components. In this talk, we will explore a new blind source separation method exploiting fully both spatial and spectral information with Xray data, and their correlations. We will introduce the mathematical concepts on which the algorithm rely, and present some current physical applications and future studies that it could benefit.
 Presentation slides [.pdf] (20 MB)
 Video [!yt]


Taylor Jacovich (CfA) Mar 09 Noon EST Zoom 
 Exploring The Parameter Space of High Energy Stellar Explosions
 Abstract:
Understanding the outflows of stellar explosions such as supernova remnants and gammaray burst afterglows requires producing a physical description of the underlying parameter space. As part of this, we must first construct computational models of the outflows. We begin by discussing GRB afterglow modeling using a modified version of the scalefree hydrodynamic fitting routine boxfit. We discuss the impact our modifications had on the original boxfit model, and how these modifications change the behavior of generated lightcurves.
We then work to model the bulk properties of the remnants of corecollapse supernovae by constructing a software pipeline to follow cradletograve massive star evolution beginning with the MESA stellar evolution code. We couple MESA to the Supernova Evolution Code (SNEC) to explode the star and follow the evolution of the ejecta with the cosmic ray hydrodynamics (ChN) code out to 7000 years post corecollapse. We consider the effects of intraremnant absorption on the observed emission from young supernova remnants and discuss the challenges in producing a grid of selfconsistent models capable of representing the parameter space.
We conclude by considering current and possible future methods for properly fitting broadband observations to the modeled parameter space, noting the role these fits will play in interpreting data from future untargeted surveys. In the case of the GRBs, we consider the implications population fits will have on understanding GRBs as a class of objects. In the case of the SNR, we consider potential observables and how they may help disentangle the degenerate parameter space of progenitor objects for corecollapse remnants.
 Presentation slides: [.pdf] ; [gDoc]
 Video [!yt]


Alex GeringerSameth (Imperial) Mar 12 1530 GMT Zoom 
 Source discovery with a deeper characterization of the highenergy diffuse background
 Abstract:
Deep, widearea gammaray observations have set the stage for
the potential detection of faint signals from important
astrophysical targets. I will discuss a few statistical
questions that arise when trying to establish the existence of
a signal and identify it as the phenomenon of interest. This
includes assessing significance given our limited
understanding of the highenergy diffuse background,
especially regarding populations of faint sources. Poisson
statistics fail to describe the scene, and this severely
lowers the power of conventional source detection. I will
present ongoing work on an empirical characterization of the
background which is used to define a new source detection
framework. The method maintains high sensitivity, correctly
includes background sources below catalog thresholds, and
includes partial knowledge about the target signal. I will
present its application to the search for dark matter signals
in dwarf galaxies, which will become essential with the next
generation of sky surveys. These statistical ideas can also be
used to probe more deeply into the physical origin of the
diffuse background itself.


Panos Toulis (University of Chicago) Mar 23 11am CDT Zoom 
 Randomization Inference of Periodicity in Unequally Spaced Time Series with Application to Exoplanet Detection
 Abstract:
The estimation of periodicity is a fundamental task in many scientific studies. Existing methods rely on assumptions that the observation times have equal or i.i.d. spacings, and that common estimators, such as the periodogram peak, are consistent and asymptotically normal. In practice, however, these assumptions are unrealistic as observation times usually exhibit deterministic patterns e.g., the nightly observation cycle in astronomy that imprint nuisance periodicities in the data. These nuisance signals also affect the finitesample distribution of estimators, which can substantially deviate from normality. Here, we propose a set identification method, fusing ideas from randomization inference and partial identification. In particular, we develop a sharp test for any periodicity value, and then invert the test to build a confidence set. This approach is appropriate because the construction of confidence sets does not rely on assumptions of regular or wellbehaved asymptotics. Notably, our inference is valid in finite samples when our method is fully implemented, while it can be asymptotically valid under an approximate implementation designed to ease computation. Empirically, our method is validated in exoplanet detection using radial velocity data. In this context, our method correctly identifies the periodicity of the confirmed exoplanets in our sample. For some other, yet unconfirmed detections, we show that the statistical evidence is particularly weak, which illustrates the failure of traditional statistical techniques. Last but not least, our method offers a constructive way to resolve these identification issues via improved observation designs. In exoplanet detection, these designs suggest meaningful improvements in identifying periodicity even when a moderate amount of randomization is introduced in scheduling radial velocity measurements.
 Working paper ; Supplement [url/.pdf]
 Presentation slides [.pdf]
 Video [!yt]


Axel Donath (Heidelberg) Apr 7 1630 CEST Zoom 
 Analysis methods and challenges in TeV gammaray astronomy
 Abstract: TeV gammaray astronomy is still a rather young field of research. However in the past two decades, observation programs such as the H.E.S.S. Galactic Plane Survey (HGPS), have shown that Galactic TeV gammaray emission is not a rare phenomenon, but with ~80 detected sources, the Milky Way shines bright in TeV gammarays.
The limited angular resolution and energy range of the current TeV instruments together with the high density of sources in the Galactic plane, underlying interstellar diffuse emission and low statistics in general, imposes many challenges on the analysis of the gammaray data and creation of source catalogs from it. Classical TeV gammaray astronomy analysis methods are typically limited to spectral (1D) and image based (2D) measurements and are neither sufficient to fully exploit data from existing instruments, nor data of the upcoming Cherenkov Telescope Array (CTA) which will survey the Galactic plane with ~10 times improved sensitivity.
In this talk I will briefly present some results and lessons we have learned from the analysis of the HGPS dataset as well as introduce the Gammapy project. Gammapy is an openly developed Python package for gammaray astronomy and a prototype for the CTA science tools. It allows for combined spectromorphological (3D) and time dependent parametric modelling of gammaray data using binned Poisson maximum likelihood fitting. Using FITS based input data formats and higher level likelihood interfaces, it also allows to combine data from multiple GeV and TeV instruments and thus extend the energy range and improve statistics of measurements of gammaray data in general.
 Presentation slides [.pdf] (23 MB)
 Video [!yt]


Tulun Ergin (TUBITAK Uzay) Apr 8 1730 EEST Zoom 
 Learning to Sample and Classify Supernova Remnants from Datasets Collected by Current & Future Gammaray Observatories
 Abstract: Supernova remnants (SNRs) are key gammaray sources to investigate the cosmic ray (CR) acceleration and escape mechanisms. So, it is crucial to study the gammaray production models in SNRs and to disentangle the unique interstellar medium that SNRs expand into. However, the number of detected and resolved SNRs has to increase so that we can have a more realistic classification scheme for SNRs and obtain a better theoretical insight into CR phenomena happening within and around SNRs. More powerful gammaray observatories, such as Cherenkov Telescope Array, with better angular resolution and higher sensitivity in detecting and resolving gammaray sources are being constructed and they will produce big volumes of data. So, we have to come up with effective methods to handle these data by implementing smart algorithms (e.g. machine learning  ML). In the first part of my talk, I will discuss the ML tools that I plan to use for sampling artificial SNRs via a customised Normalising Flow model and using these artificial samples to train a classifier for recognising SNRs in big data sets. In the second part, I will talk about the spectral modelling of these objects, which are classified as SNRs, using multiwavelength data.
 Presentation slides [.pdf] (25 MB)
 Video [!yt]


Rebecca Phillipson (University of Washington) Apr 20 9am PDT Zoom 
 Investigating Nonlinear and Stochastic Variability of Accreting Compact Objects via Recurrence Analysis
 Abstract:
We investigate the use of nonlinear time series analysis techniques for variability studies of accreting compact objects. The onset of new time domain surveys and the imminent increase in astronomical timing data expose the shortcomings in traditional time series analysis (such as power spectra analysis) in characterizing the abundantly varied, complex and stochastic time variability of accreting sources, such as Xray binaries (XRBs) and Active Galactic Nuclei (AGN). Recent applications of alternative methods to Fourier analysis used in other disciplines have shown promise in characterizing higher modes of variability and timescales in AGN and XRBs alike. In particular, methods from nonlinear dynamics utilize the projection of onedimensional time series into higher order phase spaces, enabling a characterization of trajectories in the phase space representative of the underlying dynamics which generate the original time series. Recurrence analysis in particular is a useful geometric and statistical tool for distinguishing between deterministic and stochastic behavior in time series, in addition to probing timescales and classes of variability.


Hu Sun (University of Michigan) Apr 27 Noon EDT Zoom 
 Video imputation and prediction models in context of space weather monitoring





