Presentations 

David Stenning (Imperial) 19 Jul 2018 2pm3pm EDT SSXG Operations Center at CfA 
 Classification and Modeling of Evolving Solar Features
 Abstract:
Advances in spacebased observatories are increasing both the quality and quantity of solar data, primarily in the form of highresolution images. The goal of these observatories is to better understand and predict space weather. To analyze massive streams of solar image data, we have developed a sciencedriven dimension reduction methodology to extract scientifically meaningful features from images. Adopting a sciencedriven approach, as opposed to a solely blackbox algorithmic approach, enables interpretable secondary datadriven analyses of complex phenomena, such as the evolution of magnetic active regions. The methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in active regions that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a much more informative manner than existing schemes (i.e. manual classification schemes), and (iii) is amenable to sophisticated statistical analyses.
 Presentation slides [.pdf]


Group 4 Sep 2018 Noon EDT SciCen 706 
 Organizational & EBASCS


Cora Dvorkin (HU) 11 Sep 2018 Noon EDT SciCen 706 
 Inverse Problems in Early Universe Cosmology
 Abstract: Cosmological observations have provided us with answers to ageold questions, involving the age, geometry, and composition of the universe. However, there are profound questions that still remain unanswered. I will describe ongoing efforts to shed light on some of these questions. In this talk, I will explain how we can use measurements of the Cosmic Microwave Background and the largescale structure of the universe to reconstruct the detailed physics of much earlier epochs, when the universe was only a tiny fraction of a second old. I will address this inverseproblem reconstruction from a Bayesian perspective.


Andrea Sottosanti (Imperial) 2 Oct 2018 Imperial 
 Astronomical source detection and background separation via hierarchical Bayesian nonparametric mixtures
 Abstract:
We propose an innovative approach based on Bayesian nonparametric methods to the signal extraction of astronomical sources in gammaray count maps under the presence of a strong background contamination. Our model simultaneously induces clustering on the photons using their spatial information and gives an estimate of the number of sources, while separating them from the irregular signal of the background component that extends over the entire map. From a statistical perspective, the signal of the sources is modeled using a Dirichlet Process mixture, that allows to discover and locate a possible infinite number of clusters, while the background component is completely reconstructed using a new flexible Bayesian nonparametric model based on bspline basis functions. The resultant can be then thought of as a hierarchical mixture of nonparametric mixtures for flexible clustering of highly contaminated signals. We provide also a Markov chain Monte Carlo algorithm to infer on the posterior distribution of the model parameters which does not require any tuning parameter, and a suitable postprocessing algorithm to quantify the information coming from the detected clusters. Results on different datasets confirm the capacity of the model to discover and locate the sources in the analysed map, to quantify their intensities and to estimate and account for the presence of the background contamination.
 Presentation slides [.pdf]


Xixi Yu (Imperial) 23 Oct 2018 Imperial 
 Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models
 Abstract: Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters.
Instead of using the standard approach, a common strategy deployed by the astrophysicists, that treats the uncertainty as fixed and known and obtains the bestfit values of the parameters, we propose a multistage analysis to prevent underestimation of the error bars on the model parameters and increase the accuracy of the analysis results. Four methods for a twostage analysis are outlined, the standard method, multiple imputation, the pragmatic and the fully Bayesian methods. A case study on Fe XIII is discussed where two different priors, discrete uniform and Gaussian approximation via principal component analysis prior, are deployed.
 Presentation slides [.pdf]


Yang Chen (UMich) 30 Oct 2018 UMich 
 A second look at cstat
 Abstract: After decades of least chisquares fitting and goodnessoffit, the Cstat has been gaining popularity in the astrophysics community for model fitting and assessment of goodnessoffit. In this work, we study the statistical properties of the Cstat and explore lowerresolution Cstat fitting and testing, which potentially improves statistical and computational efficiency. This is ongoing joint work with CHASC team.


David Jones (TAMU) 13 Nov 2018 TAMU 
 Exoplanet detection: some statistical challenges
 Abstract: The radial velocity (RV) technique is one of the
two main approaches for detecting planets outside our solar system. The
method works by detecting the Doppler shift resulting from the motion of
a host star caused by an orbiting planet. Unfortunately, this Doppler
signal is typically contaminated by various "stellar activity"
phenomena, such as dark spots on the star surface. This makes it
difficult to determine if a planet is really present or not.
Last time I presented a Gaussian process framework for separating planet
RV signals from stellar activity. In this talk, I will review the key
points of the method and discuss current statistical challenges and
opportunities for generalizing and improving the approach. I will also
discuss related computational challenges in exoplanet detection.
 Presentation slides [.pdf]


Thomas Lee (UC Davis) 27 Nov 2018 UCD 
 Change Point Detection for Poisson Time Series Images with Applications to Astronomy and Astrophysics
 Presentation slides [.pdf]


Hyungsook Tak (Notre Dame) 11 Dec 2018 ND 
 Time Delay Lens Modeling Challenge for the Hubble Constant Estimation
 Abstract: The Hubble constant is a core cosmological parameter that represents the current expansion rate of the Universe. One way to infer this quantity is to use strong gravitational lensing, i.e., an effect that multiple images of an astronomical object (e.g., a quasar) appear in the sky. This effect occurs when the trajectories of the light (from the object to the Earth) are bent by a strong gravitational field of an intervening galaxy. Strong gravitational lensing produces two types of the data; (i) multiple brightness time series data of the gravitationallylensed images and (ii) pixelwise image data of the lens and lensed object. The former is used to infer time delays between the arrival times of the multiplylensed images (arXiv 1602.01462 ) and the latter is used to estimate gravitational potential that the lensed images pass through (arXiv 1801.01506 ). These two components are used to infer the Hubble constant via physical equations. In this talk, I explain how we infer the Hubble constant using the relationship among these three components, i.e., time delays, gravitational potential, and the Hubble constant. I will also describe the performance of this approach during the first stage of a blind competition, called the Time Delay Lens Modeling Challenge.
 Presentation slides [.pdf]

Vinay Kashyap (CfA), Katy McKeough (HU), Luis Campos (HU), et al. 29 Jan 2019 SciCen 706 
 Introduction to Highenergy Astronomy Data for Statisticians
 We will describe what highenergy datasets look like using the example of the Chandra Xray Observatory. We will then highlight some of the problems our group has tackled in the past, and focus in detail on two current projects: (i) to isolate and locate extended sources in posterior draw images, and (ii) to probabilistically disentangle photons from overlapping sources using spatial, spectral, and temporal variability information.
 Chandra archive: cda.harvard.edu/chaser [url]
 Presentation slides: Kashyap, McKeough, Campos [.pdf]


Paul Baines (Wise.io) 5 Feb 2019 Berkeley 
 The Colorful Stars and the Black Box: Bayesian Analysis of Stellar Populations
 Abstract:
Many modern statistical applications involve noisy observations of an underlying process that can best be described by a complex deterministic system. In astrophysics these systems often involve the solution of partial differential equations representing the best available understanding of the underlying physical processes. Statistical computation in such settings is hampered by the use of lookup tables or expensive `blackbox' function evaluations.
The estimation of properties of stellar populations provides an example of statistical modeling with such a `lookup table' likelihood. The mapping between the physical parameters and the dataspace cannot be solved analytically and is represented as a series of lookup tables. In this context, we present a flexible hierarchical model for analyzing stellar populations. By utilizing the structure of the posterior distribution we construct efficient data augmentation schemes which create a robust sampling procedure. The performance of various sampling schemes are presented, together with the results of applying our model to a wellstudied dataset.


Gabriel Collin (MIT) 12 Feb 2019 SciCen 706 
 Simulating light in large volume detectors using Metropolis Light Transport
 Abstract: In gigaton scale neutrino detectors, such as the IceCube experiment, interaction products are detected by the Cherenkov radiation emitted by their passage through the detector medium. Simulating this propagation of light is traditionally approached through ray tracing. This is complicated by the sparsity of the detector: the vast majority of light rays are scattered and absorbed by the detector medium, with only a tiny fraction finding their way to a light sensitive element. In this presentation, I develop an alternative method, based on the Metropolis light transport algorithm used in the CGI industry. This method poses the problem as a classical path integral, and samples only the paths of light rays that end on a light sensitive element using a Markov chain MonteCarlo. This yields a significant performance increase compared to ray tracing when simulating the timing distribution of light detected by a photosensitive element. The general concept behind this method can be widely applied, and I discuss some potential applications to other problem areas in physics and astronomy.
 Presentation slides [.pdf]


Daniel Muthukrishna (Cambridge) 19 Feb 2019 Cambridge 
 Realtime classification of explosive transients using deep recurrent neural networks
 Abstract:
Astronomical transients are stellar objects that become temporarily brighter on various timescales and have led to some of the most significant discoveries in cosmology and astronomy. New and upcoming widefield surveys such as the Zwicky Transient Facility (ZTF) and the Large Synoptic Survey Telescope (LSST) will record millions of multiwavelength transient alerts each night. To meet this demand, we have developed a novel machine learning approach, RAPID (Realtime Automated Photometric Identification using Deep learning), that automatically classifies transients as a function of time. Using a deep recurrent neural network (RNN) with Gated Recurrent Units (GRUs), we are able to quickly classify multichannel, sparse, time series datasets into 12 different astrophysical types. The classification accuracy improves over the lifetime of the transient as more photometric data becomes available. In this talk, I will explain the main parts of our deep neural network architecture and describe our approach's classification performance on simulated and real data streams.
 Presentation slides [.pdf]


Di Zhang (UCIrvine) 5 Mar 2019 UCI 
 New populationbased MCMC method
 Abstract:
We developed a general MCMC strategy to sample from difficult posterior distributions, for example multimodal distributions. As is well known, usual MCMC methods may easily get stuck in local modes and suffer from slow convergence. To sample from a probability density with likely complicated shape or multimodes, our approach is to run multiple chains from dispersed starting values and to propose a twostep jump at each iteration. The novel between chain jump proposal tries to move to the neighborhood of the iterate in another chain. The intuition is that by running multiple chains exploring different modes and enabling between chain jumps, essentially each chain would jump between modes efficiently. We applied the method in Bayes factor computation for Bayesian model selection, variance component estimation in mixed effect models and analysis of spectra data. As a numerical illustration, we use this method to fit thermal models with Capella data.
 Presentation slides [.pdf]


Sara Algeri (UMinnesota) 19 Mar 2019 UMinn 
 Detecting new signals under background mismodelling
 Abstract:
Searches for new astrophysical phenomena often involve several sources of nonrandom uncertainties which can lead to highly misleading results. Among these, modeluncertainty arising from background mismodelling can drammatically compromise the sensitivity of the experiment under study. Specifically, overestimating the background distribution in the signal region increases the chances of missing new physics. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming false discoveries. The aim of this work is to provide a unified statistical algorithm to perform modelling, estimation, inference and signal characterization under backgroundmismodelling. The method proposed allows to incorporate the (partial) scientific knowledge available on the background distribution, and provides a dataupdated version of it in a purely nonparametric fashion, without requiring the specification of prior distributions. If a calibration sample or control regions are available, the solution discussed does not require the specification of a model for the signal; however, if the signal distribution is known, up to some free parameters, it allows to further improve the accuracy of the analysis and to detect additional signals of unexpected new sources.


Lucas Janson (Harvard) 9 Apr 2019 SCiCen 706 
 HighDimensional Variable Selection via ModelX Knockoffs
 Abstract: Many contemporary largescale applications, from
genomics to advertising, involve linking a response of interest to a
large set of potential explanatory variables in a nonlinear fashion,
such as when the response is binary. Although this modeling problem has
been extensively studied, it remains unclear how to effectively select
important variables while controlling the fraction of false discoveries,
even in highdimensional logistic regression, not to mention general
highdimensional nonlinear models. To address such a practical problem,
we propose a new framework of modelX knockoffs, which reads from a
different perspective the knockoff procedure (Barber and Cands, 2015)
originally designed for controlling the false discovery rate in
lowdimensional linear models. ModelX knockoffs can deal with arbitrary
(and unknown) conditional models and any dimensions, including when the
number of explanatory variables p exceeds the sample size n. Our
approach requires the design matrix be random (independent and
identically distributed rows) with a known distribution for the
explanatory variables, although we show preliminary evidence that our
procedure is robust to unknown/estimated distributions. As we require no
knowledge/assumptions about the conditional distribution of the
response, we effectively shift the burden of knowledge from the response
to the explanatory variables, in contrast to the canonical modelbased
approach which assumes a parametric model for the response but very
little about the explanatory variables. To our knowledge, no other
procedure solves the controlled variable selection problem in such
generality, but in the restricted settings where competitors exist, we
demonstrate the superior power of knockoffs through simulations. We also
apply our procedure to data from a casecontrol study of Crohn's disease
in the United Kingdom, making twice as many discoveries as the original
analysis of the same data. This talk will be based on the following
paper: http://onlinelibrary.wiley.com/doi/10.1111/rssb.12265/full
 Presentation slides [.pdf]


Arturo Avelino (CfA) postponed/TBR 
 TBD


David Stenning (Imperial) postponed/TBR Imperial 
 TBD


Vinay Kashyap, Mark Weber, & Aneta Siemiginowska (CfA) postponed/TBR SciCen 706 
 The Feigelson List





