Presentations 

Ana Diaz Rivero (Harvard) Sep 1 2020 Zoom 
 Flowbased Likelihoods for NonGaussian Inference
 Abstract: We investigate the use of datadriven
likelihoods to bypass a key assumption made in many scientific
analyses, which is that the true likelihood of the data is
Gaussian. In particular, we suggest using the optimization
targets of flowbased generative models, a class of models
that can capture complex distributions by transforming a
simple base distribution through layers of nonlinearities. We
call these flowbased likelihoods (FBL). We analyze the
accuracy and precision of the reconstructed likelihoods on
mock Gaussian data, and show that simply gauging the quality
of samples drawn from the trained model is not a sufficient
indicator that the true likelihood has been learned. We
nevertheless demonstrate that the likelihood can be
reconstructed to a precision equal to that of sampling error
due to a finite sample size. We then apply FBLs to mock weak
lensing convergence power spectra, a cosmological observable
that is significantly nonGaussian (NG). We find that the FBL
captures the NG signatures in the data extremely well, while
other commonlyused datadriven likelihoods, such as Gaussian
mixture models and independent component analysis, fail to do
so. This suggests that works that have found small posterior
shifts in NG data with datadriven likelihoods such as these
could be underestimating the impact of nonGaussianity in
parameter constraints. By introducing a suite of tests that
can capture different levels of NG in the data, we show that
the success or failure of traditional datadriven likelihoods
can be tied back to the structure of the NG in the data.
Unlike other methods, the flexibility of the FBL makes it
successful at tackling different types of NG simultaneously.
Because of this, and consequently their likely applicability
across datasets and domains, we encourage their use for
inference when sufficient mock data are available for
training.
 Presentation Slides [.pdf]
 Reference: arXiv:2007.05535 [arXiv]


Herman Marshall (MIT) & Yang Chen (Michigan) Sep 8 2020 Zoom 
 Concordance: Inflight Calibration of Xray Telescopes without Absolute References
 Abstract:
We describe a process for crosscalibrating the effective
areas of Xray telescopes that observe common targets. The
targets are not assumed to be ``standard candles'' in the
classic sense, in that the only prior placed on the source
fluxes is that these fluxes have true but unknown values.
Using a technique developed by Chen et al. (2019) that
involves a statistical method called shrinkage, we determine
effective area correction factors for each instrument that
brings estimated fluxes into the best agreement, consistent
with prior knowledge of their effective areas. We expand the
technique to allow unique priors on systematic uncertainties
in effective areas for each Xray astronomy instrument and to
allow correlations between effective areas in different energy
bands. We demonstrate the method with several data sets from
various Xray telescopes.
 Presentation slides: Herman Marshall; Yang Chen [.pdf]
 Reference: Chen et al. 2019, JASA, 114:527, 1018
 Video [!yt]


Katy McKeough (Harvard) Sep 29 2020 Zoom 
 Maximizing a High Dimensional Posterior Using a Genetic Algorithm
 Abstract: Astronomers are interested in delineating
boundaries of extended sources in noisy images. Analyzing the
morphology of these objects is particularly challenging for Xray
images of high redshift sources where there are a limited number
of highenergy photon counts. We apply a multiphase technique
in order to estimate the minimal boundary, the point at which the
source is no longer distinguishable from the background noise, for
complex astronomical objects. One step of this approach is to build
a posterior describing an arrangement of pixel assignments assigning
each pixel to either a region of interest or the background. In
this case of interest, we would like to find the global maximum in a
posterior space that is discrete but large. This is difficult since
the posterior evaluated at a specific pixel arrangement is very
small, leading to underflow errors in calculating the posterior
of any particular pixel assignment. Furthermore, it is difficult
to determine which pixel arrangements to optimize over since the
space is too large to explore each one is unfeasible to explore
every possibility. Genetic algorithms offer an efficient solution to
optimization in high dimensional spaces. Using genetic algorithms we
are able to explore a large amount of the relevant posterior space
and find a pixel assignment close to the global maximum.
 Presentation slides [.pdf]


Yang Chen (Michigan) Oct 20 2020 Zoom 
 Machine Learning Efforts on Solar Flare Predictions
 Abstract: In this talk, we present our machine
learning efforts, which show great promise towards early
predictions of solar flare events. (1) We present a data
preprocessing pipeline that is built to extract useful data
from multiple sources  Geostationary Operational
Environmental Satellites (GOES) and Solar Dynamics Observatory
(SDO)/Helioseismic and Magnetic Imager (HMI) and
SDO/Atmospheric Imaging Assembly (AIA)  to prepare inputs
for machine learning algorithms. (2) For our strong/weak flare
classification model, case studies show a significant increase
in the prediction score around 20 hours before strong solar
flare events, which implies that early precursors appear at
least 20 hours prior to the peak of a flare event. (3) We
develop a mixed Long Short Term Memory (LSTM) regression model
to predict the maximum solar flare intensity within a 24hour
time window. (4) Our ongoing and future work will also be
briefly mentioned.
 Video [!yt]


Aarya Patil (UToronto) Nov 17 2020 Zoom 
 Likelihoodfree Inference of Chemical Homogeneity in Open Clusters
 Abstract:
Star clusters are excellent astrophysical laboratories to
study the history of star formation and chemical enrichment in
our Galaxy. These are groupings of stars born out of the same
gas cloud, and are theoretically expected to have similar
chemical compositions. Empirically validating this chemical
homogeneity is important yet difficult because the measurement
of accurate and precise chemistry of stars using stellar
spectroscopic data is statistically challenging. We perform
highfidelity Likelihoodfree Inference of chemistry of stars
using stateoftheart Neural Density Estimation to
observationally determine the level of chemical homogeneity in
open clusters. We make our model computationally efficient by
using Functional Principal Component Analysis that models the
lowdimensional intrinsic structure embedded in the ~10,000
dimensional stellar spectroscopic space. Our constraints on
chemical homogeneity will not only help understand the
detailed evolution of starforming clouds but also allow us to
trace the chemical and dynamical history of our Galaxy through
chemical tagging.
 Presentation slides [.pdf]
 Video [!yt]


Diab Jerius (CXC/CfA) Dec 8 2020 Zoom 
 Doing the HokeyPokey
or
Deriving Statistical errors for Measurements of the Chandra Xray Observatory PSF
 Abstract: The Chandra XRay Observatory's PSF is a twodimensional wonder. It's not exactly symmetric, depends upon the astrophysical input spectrum and gets folded through instruments with various degrees of fidelity.
Still, it seems to get the job done, and some of the questions often asked are:
 What exactly does the PSF look like for my source?
 If I want to test some bit of astrophysics, what are the intrinsic errors in our knowledge of the PSF, so I can determine the sensitivity of my measurements?
 How can I simulate my observation to see if I can understand what the source looks like?
Answers to these questions are based on both models of the optics and measurements of the actual PSF.
In this talk I'll give a brief(!) overview of the optical model, introduce a simple but useful parameterization of the measured PSF (the encircled energy function), describe its use and its systematic errors, relate our attempts at deriving realistic measurement errors, and, finally, plead for your assistance in helping us refine those errors so that they are meaningful.
 Presentation slides [.pdf]
 Video [!yt]


Xufei Wang (Harvard) Jan 5 2021 Zoom 
 Maximum Product of Spacings: A Simple Comparison with Maximum Likelihood
 Abstract: An intriguing property of the maximum
product spacing method is that since the product of spacings
is a pivotal quality in general, obtaining confidence
intervals or performing hypothesis testing is always exact
(other than numerical imprecision). However, there is a price
to be paid for this exactness in terms of the width of the
interval or the power of the test, in comparison with the
Maximum Likelihood approach, which in general is valid only
asymptotically. In this talk, we compare the two methods to
illustrate these issues in the context of estimating a
boundary point of a uniform distribution (where exact
calculations can be done for both).
 Context: 2020 May 19, 2020 Jul 7.
 Presentation slides [.pdf]
 Video [!yt]


Aneta Siemiginowska (CfA) Jan 26 2021 Zoom 
 Discussion of New Astrostatistics Projects for Students
 We plan to have a discussion about astrostatistics projects. The meeting will be geared towards the new students and provide the background introduction.
 Presentation slides [.pdf]
 Video [!yt]


Cong Xu (UC Davis) Feb 9 2021 9am PST Zoom 
 Change point detection and image segmentation for time series of astrophysical images
 Abstract: In this talk, we present a method for
modeling a time series of astronomical images. The method
assumes that at each time point, the corresponding multiband
image stack is an unknown 3D piecewise constant function
including Poisson noise. It also assumes that all image stacks
between any two adjacent change points (in time domain) share
the same unknown piecewise constant function. The proposed
method is designed to estimate the number and the locations of
all the change points (in time domain), as well as all the
unknown piecewise constant functions between any pairs of the
change points. A practical algorithm is also developed to
solve the corresponding complicated optimization problem.
Applications to two real datasets, the XMM observation of a
flaring star and an emerging solar coronal loop, illustrate
the usage of the proposed method and the scientific insight
gained from it.
 See also:
Xu et al. 2021 [arXiv] github.com/kevinxucong/4D_Automark Wong et al. 2015 [Automark]
 Presentation slides [.pdf]
 Video [!yt]


Adrien Picquenot (Université ParisSaclay) Feb 22 6pm CET Zoom 
 Introduction and application of a new blind source separation method for extended sources in Xray astronomy
 Abstract:
Some extended sources, among which we find the supernovae
remnants, present an outstanding diversity of morphologies
that the current generation of spectroimaging telescopes can
detect with an unprecedented level of details. However, the
data analysis tools currently in use in the high energy
astrophysics community fail to take full advantage of these
data : most of them only focus on the spectral information
without using the many spatial specificities or the
correlation between the spectral and spatial dimensions. For
that reason, the physical parameters that are retrieved are
often widely contaminated by other components. In this talk,
we will explore a new blind source separation method
exploiting fully both spatial and spectral information with
Xray data, and their correlations. We will introduce the
mathematical concepts on which the algorithm rely, and present
some current physical applications and future studies that it
could benefit.
 Presentation slides [.pdf] (20 MB)
 Video [!yt]


Taylor Jacovich (CfA) Mar 09 Noon EST Zoom 
 Exploring The Parameter Space of High Energy Stellar Explosions
 Abstract:
Understanding the outflows of stellar explosions such as
supernova remnants and gammaray burst afterglows requires
producing a physical description of the underlying parameter
space. As part of this, we must first construct computational
models of the outflows. We begin by discussing GRB afterglow
modeling using a modified version of the scalefree
hydrodynamic fitting routine boxfit. We discuss the impact our
modifications had on the original boxfit model, and how these
modifications change the behavior of generated lightcurves.
We then work to model the bulk properties of the remnants of
corecollapse supernovae by constructing a software pipeline
to follow cradletograve massive star evolution beginning
with the MESA stellar evolution code. We couple MESA to the
Supernova Evolution Code (SNEC) to explode the star and follow
the evolution of the ejecta with the cosmic ray hydrodynamics
(ChN) code out to 7000 years post corecollapse. We consider
the effects of intraremnant absorption on the observed
emission from young supernova remnants and discuss the
challenges in producing a grid of selfconsistent models
capable of representing the parameter space.
We conclude by considering current and possible future methods
for properly fitting broadband observations to the modeled
parameter space, noting the role these fits will play in
interpreting data from future untargeted surveys. In the case
of the GRBs, we consider the implications population fits will
have on understanding GRBs as a class of objects. In the case
of the SNR, we consider potential observables and how they may
help disentangle the degenerate parameter space of progenitor
objects for corecollapse remnants.
 Presentation slides: [.pdf] ; [gDoc]
 Video [!yt]


Alex GeringerSameth (Imperial) Mar 12 1530 GMT Zoom 
 Source discovery with a deeper characterization of the highenergy diffuse background
 Abstract:
Deep, widearea gammaray observations have set the stage for
the potential detection of faint signals from important
astrophysical targets. I will discuss a few statistical
questions that arise when trying to establish the existence of
a signal and identify it as the phenomenon of interest. This
includes assessing significance given our limited
understanding of the highenergy diffuse background,
especially regarding populations of faint sources. Poisson
statistics fail to describe the scene, and this severely
lowers the power of conventional source detection. I will
present ongoing work on an empirical characterization of the
background which is used to define a new source detection
framework. The method maintains high sensitivity, correctly
includes background sources below catalog thresholds, and
includes partial knowledge about the target signal. I will
present its application to the search for dark matter signals
in dwarf galaxies, which will become essential with the next
generation of sky surveys. These statistical ideas can also be
used to probe more deeply into the physical origin of the
diffuse background itself.


Panos Toulis (University of Chicago) Mar 23 11am CDT Zoom 
 Randomization Inference of Periodicity in Unequally Spaced Time Series with Application to Exoplanet Detection
 Abstract:
The estimation of periodicity is a fundamental task in many
scientific studies. Existing methods rely on assumptions that
the observation times have equal or i.i.d. spacings, and that
common estimators, such as the periodogram peak, are
consistent and asymptotically normal. In practice, however,
these assumptions are unrealistic as observation times usually
exhibit deterministic patterns e.g., the nightly
observation cycle in astronomy that imprint nuisance
periodicities in the data. These nuisance signals also affect
the finitesample distribution of estimators, which can
substantially deviate from normality. Here, we propose a set
identification method, fusing ideas from randomization
inference and partial identification. In particular, we
develop a sharp test for any periodicity value, and then
invert the test to build a confidence set. This approach is
appropriate because the construction of confidence sets does
not rely on assumptions of regular or wellbehaved
asymptotics. Notably, our inference is valid in finite samples
when our method is fully implemented, while it can be
asymptotically valid under an approximate implementation
designed to ease computation. Empirically, our method is
validated in exoplanet detection using radial velocity data.
In this context, our method correctly identifies the
periodicity of the confirmed exoplanets in our sample. For
some other, yet unconfirmed detections, we show that the
statistical evidence is particularly weak, which illustrates
the failure of traditional statistical techniques. Last but
not least, our method offers a constructive way to resolve
these identification issues via improved observation designs.
In exoplanet detection, these designs suggest meaningful
improvements in identifying periodicity even when a moderate
amount of randomization is introduced in scheduling radial
velocity measurements.
 Working paper ; Supplement [url/.pdf]
 Presentation slides [.pdf]
 Video [!yt]


Axel Donath (Heidelberg) Apr 7 1630 CEST Zoom 
 Analysis methods and challenges in TeV gammaray astronomy
 Abstract: TeV gammaray astronomy is still a
rather young field of research. However in the past two
decades, observation programs such as the H.E.S.S. Galactic
Plane Survey (HGPS), have shown that Galactic TeV gammaray
emission is not a rare phenomenon, but with ~80 detected
sources, the Milky Way shines bright in TeV gammarays.
The limited angular resolution and energy range of the current
TeV instruments together with the high density of sources in
the Galactic plane, underlying interstellar diffuse emission
and low statistics in general, imposes many challenges on the
analysis of the gammaray data and creation of source catalogs
from it. Classical TeV gammaray astronomy analysis methods
are typically limited to spectral (1D) and image based (2D)
measurements and are neither sufficient to fully exploit data
from existing instruments, nor data of the upcoming Cherenkov
Telescope Array (CTA) which will survey the Galactic plane
with ~10 times improved sensitivity.
In this talk I will briefly present some results and lessons
we have learned from the analysis of the HGPS dataset as well
as introduce the Gammapy project. Gammapy is an openly
developed Python package for gammaray astronomy and a
prototype for the CTA science tools. It allows for combined
spectromorphological (3D) and time dependent parametric
modelling of gammaray data using binned Poisson maximum
likelihood fitting. Using FITS based input data formats and
higher level likelihood interfaces, it also allows to combine
data from multiple GeV and TeV instruments and thus extend the
energy range and improve statistics of measurements of
gammaray data in general.
 Presentation slides [.pdf] (23 MB)
 Video [!yt]


Tulun Ergin (TUBITAK Uzay) Apr 8 1730 EEST Zoom 
 Learning to Sample and Classify Supernova Remnants from Datasets Collected by Current & Future Gammaray Observatories
 Abstract: Supernova remnants (SNRs) are key
gammaray sources to investigate the cosmic ray (CR)
acceleration and escape mechanisms. So, it is crucial to study
the gammaray production models in SNRs and to disentangle the
unique interstellar medium that SNRs expand into. However, the
number of detected and resolved SNRs has to increase so that
we can have a more realistic classification scheme for SNRs
and obtain a better theoretical insight into CR phenomena
happening within and around SNRs. More powerful gammaray
observatories, such as Cherenkov Telescope Array, with better
angular resolution and higher sensitivity in detecting and
resolving gammaray sources are being constructed and they
will produce big volumes of data. So, we have to come up with
effective methods to handle these data by implementing smart
algorithms (e.g. machine learning  ML). In the first part of
my talk, I will discuss the ML tools that I plan to use for
sampling artificial SNRs via a customised Normalising Flow
model and using these artificial samples to train a classifier
for recognising SNRs in big data sets. In the second part, I
will talk about the spectral modelling of these objects, which
are classified as SNRs, using multiwavelength data.
 Presentation slides [.pdf] (25 MB)
 Video [!yt]


Rebecca Phillipson (University of Washington) Apr 20 9am PDT Zoom 
 Investigating Nonlinear and Stochastic Variability of Accreting Compact Objects via Recurrence Analysis
 Abstract:
We investigate the use of nonlinear time series analysis
techniques for variability studies of accreting compact
objects. The onset of new time domain surveys and the imminent
increase in astronomical timing data expose the shortcomings
in traditional time series analysis (such as power spectra
analysis) in characterizing the abundantly varied, complex and
stochastic time variability of accreting sources, such as
Xray binaries (XRBs) and Active Galactic Nuclei (AGN). Recent
applications of alternative methods to Fourier analysis used
in other disciplines have shown promise in characterizing
higher modes of variability and timescales in AGN and XRBs
alike. In particular, methods from nonlinear dynamics utilize
the projection of onedimensional time series into higher
order phase spaces, enabling a characterization of
trajectories in the phase space representative of the
underlying dynamics which generate the original time series.
Recurrence analysis in particular is a useful geometric and
statistical tool for distinguishing between deterministic and
stochastic behavior in time series, in addition to probing
timescales and classes of variability.
 Presentation slides [.pdf]
 Video [!yt]


Hu Sun (University of Michigan) Apr 27 Noon EDT Zoom 
 Video imputation and prediction models in context of space weather monitoring
 Abstract:
The total electron content (TEC) maps can be used to estimate
the signal delay of GPS due to the ionospheric electron
content between a receiver and satellite. This delay can
result in GPS positioning error, thus making it important to
monitor the TEC maps. The observed TEC maps have big patches
of missingness in the ocean and scattered small areas of
missingness on the land, leading to ~75% of the data missing
globally. With the purpose of filling in "reasonable" values
into both the big missing patches and small scattered areas,
we extend the traditional matrix completion and matrix
factorization algorithm and propose a novel method called
Video Imputation with SoftImpute, Temporal smoothing and
Auxiliary data (VISTA). The proposed method accounts for both
spatial and temporal smoothness when doing the imputation,
enabling the resulting map to preserve both globalscale and
mesoscale structures of the TEC map. Using both real TEC data
and simulated data, we show that our proposed method achieves
better reconstructed TEC maps as compared to existing methods
in literature. Future researches can easily benefit from: 1)
our imputation method to impute other types of astrophysics
images; 2) our TEC map data product for prediction tasks.
 See also: Sun et al. 2020, Matrix Completion Methods for the Total Electron Content Video Reconstruction [arXiv:2012.01618]
 Presentation slides [.pdf]
 Video [!yt]


Max Autenrieth (Imperial) June 15 5:00pm BST Zoom 
 Stratified Learning: A generalpurpose method for learning under covariate shift with applications to observational cosmology
 Abstract: Supervised machine learning will be central in the analysis of upcoming largescale sky surveys. However, selection bias for astronomical objects yields labelled training data that is not representative for the unlabelled target data distribution. This affects the predictive performance with unreliable target predictions.
We propose a novel, statistically principled and theoretically justified method to improve learning under such covariate shift conditions, based on propensity score stratification, a wellestablished methodology in causal inference. We show that the effects of covariate shift can be reduced or altogether eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners on subgroups ("strata") constructed by partitioning the data based on the estimated propensity scores, leading to balanced covariates and muchimproved target prediction.
We demonstrate the effectiveness of our generalpurpose method on contemporary research questions in observational cosmology, and on additional benchmark examples, matching or outperforming stateoftheart importance weighting methods, widely studied in the covariate shift literature. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge" and improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.
 Presentation slides [.pdf]
 Video [!yt]





