Presentations 

Meng XiaoLi (Harvard) 3 Sep 2013 
 A Statistician's View of Upcoming Grand Challenges
 A talk to introduce new students to concepts and challenges
in AstroStatistics. Reprise of a talk given at the American
Astronomical Society, highlighting some of the work done by
former grad students.
 Slides [.ppt]

Meng XiaoLi (Harvard) Kaisey Mandel (CfA) 17 Sep 2013 
 The IgNobel 24/7 Lecture on Statistics (XLM)
 transcript [.docx]
 Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances (KM)
 Abstract:
Type Ia supernovae (SN Ia) are the most precise cosmological
distance indicators, important for measuring the
acceleration of the Universe and the properties of dark
energy. I will give an overview of two applications in which
Bayesian statistical modeling has been effective for using
extracting scientific inferences from the observational
data. Supernova types are typically inferred using the
spectra or photometric light curves (time series) of the
supernova. I will describe an alternative approach for
probabilistic classification of supernovae by modeling the
correlations between supernova class and galaxy properties.
Next, I will describe a principled, hierarchical Bayesian
approach to coherently model the multiple random and
uncertain effects, such as measurement error, dimming and
reddening due to interstellar dust and intrinsic covariance,
underlying the observed SN Ia data. This strategy is applied
to the modeling of spectroscopic and color data to estimate
physical correlations and improve estimates on the effects
of dust. These applications demonstrate aspects of Bayesian
model building, statistical computation, and model checking
and evaluation.
 Slides [.pdf]

Aneta Siemiginowska (CfA) 1 Oct 2013 
 Stochastic Model for Quasar Variability
 Abstract:
Quasars are highly energetic active nuclei of distant galaxies.
Because quasars are very luminous they allow us to probe the high
redshift (young) universe. Their power comes from accretion onto a
black hole of a mass exceeding a hundred million Solar masses. Quasars
play a significant role in formation of galaxies and large scale
structures in the universe. Although there has been huge progress
in quasar research, astronomers still do not fully understand the physics
of this phenomenon. In particular, physical processes occurring close
to the black hole, and associated with the release of high energy
radiation and relativistic jets are not fully understood. Studies of
the high energy radiation, i.e. Xrays and gammarays, can give us
some clues to Quasar physics. I will briefly show and describe the
high energy data collection process and show examples of typical Xray
and gammaray observations. In a large part of my talk I will discuss
the quasar variability and present our stochastic model for
fluctuations in the observed light curves. The model is based on a
linear combination of stochastic processes. We define the likelihood
for the process enabling us to estimate the parameters of the process,
including break frequencies in the power spectral density. I will
discuss the application of our model to Xray and gammaray light
curves of high luminosity quasars and future directions for this
project.
 Presentation slides [.pdf]

Saeqa Vrtilek (CfA) & Luke Bornn (Harvard) 22 Oct 2013 
 Adding a new dimension: Multivariate studies of Xray binaries
 Abstract:
Xray binaries consisting of a normal star orbiting a compact object
owe their prominence to one of the most efficient energy release
mechanisms known: accretion onto a compact object. The diverse
behaviors displayed by Xray binaries are wellstudied, yet one of
the most fundamental physical markers of each of these systems 
whether the central accreting object is a black hole or a neutron
star  has been remarkably difficult to establish.
We have recently developed a modelindependent means of identifying
the nature of the compect object. We have found that given suitable
collections of data in the right variables various types of Xray
binaries separate into complex but geometrically distinct volumes.
We use clustering techniques to fully characterize the nonlinear
geometry of different object types in a rigorous and statistically
sound manner. To exploit the vast prior literature, as well as the
existing hierarchical structure in Xray binary systems, we will
embed physicsbased models within Bayesian hierarchical models.
We note that the tools we develop meet analogous purposes in all
datadriven fields, as the fundamental problem of classification
of multivariate data with complex geometric dependencies is
fieldspanning.
 Slides:
Saku [.pptx]
Saku [.pdf]
Luke [.pdf]

Lazhi Wang (Harvard) 5 Nov 2013 
 Xray Dark Sources Detection
 Abstract:
The goal of source detection is often to obtain the luminosity
function, which specifies the relative number of sources at each
luminosity for a population. Of particular interest is the existence
of "dark" sources in the population. In this talk, I will first
briefly review the problem and the hierarchical Bayesian model, in
which a zeroinflated gamma distribution is used to model the source
intensities. Two small extensions of the original models are given.
Second, I will review the hypothesis testing on the existence of
"dark" sources and how the posterior predictive pvalue (ppp) is
obtained. Thirdly, some extensive simulation results will be given
to demonstrate the robustness of the model, the noninformativeness
of the prior and the effectiveness of ppp. Finally, results for
real data analysis will be shown and discussed in detail.
 Slides [.pdf]


David Jones (Harvard) 12 Nov 2013 
 Overlapping Sources
 Abstract:
Sources are often situated close enough together that they cannot
be fully resolved instrumentally and it is of interest to infer the
number of individual sources, their locations, and their respective
intensities. Convolving a number of sources with the point spread
function (PSF) results in a finite mixture model. We incorporate
spectral models, background contamination, and a latent Poisson
process for the number and positions of the sources. The resulting
multilevel model is fit with RJMCMC (Richardson and Green 1997) to
obtain posterior distributions for the number of sources and their
individual parameters. Sensitivity to the prior distribution on the
number of sources is low due to our knowledge of the PSF. A simulation
study with a range of source separations, relative source intensities
and background strengths will be presented to demonstrate performance
and the impact of including the spectral data. Lastly results for
two real datasets will be discussed. In one case the main aim is
to determine the number of sources, their positions and intensities.
Here the spectral data will be included to reduce uncertainty. In
the second case, separation of the spectral distributions themselves
will also be considered.
 Slides [.pdf]

Raymond Wong (UCD) 26 Nov 2013 
 Joint SpectralTemporal Analysis of HighEnergy Astronomical Sources
 Abstract:
In this work we apply semiparametric techniques to the joint
spectraltemporal modeling of highenergy astronomical data. This
includes the automatic detection of emission lines and structural
breaks in the temporal direction. We apply L1 penalties to regularize
the model fitting. The "dimension" of the bestfitting model is
chosen by a new form of the minimum description length principle
that is designed for the "large p small n" scenario.
 Slides [.pdf]


Xu Jin (UCI) 3 Dec 2013 
 Calibration Uncertainty


Group 14 Jan 2014 
 Group Discussion on the Time Delay Challenge
 Strong Lens TDC website: http://timedelaychallenge.org


David van Dyk (Imperial) 21 Jan 2014 
 The Assessment of Evidence in the Discovery of a Higgs Boson
 Abstract: [.pdf]
 The 20122013 discovery of a Higgs boson filled the last remaining
gap in the Standard Model of particle physics and was greeted with
fanfare in the scientific community and by the public at large.
Particle physicists have developed and rigorously tested a specialized
statistical tool kit that is designed for the search for new physics.
This tool kit was put to the test in a 40year search that culminated
in the discovery of a Higgs boson. This talk reviews these statistical
methods, the controversies that surround them, and how they led to
this historic discovery. It concludes with a Bayesian critique of
the use of pvalues to assess the evidence for a Higgs boson and a
discussion of the possible use instead of Bayesian methods that are
being developed for a related statistical problem in highenergy
astrophysics.
 Keywords: Bayes factors, detection, exclusion, hypothesis
testing, look elsewhere effect, Lindley's paradox, Poisson models, sensitivity, upper limits
 Presentation slides [.pdf]


Eric Feigelson 29,31 Jan 2014 Phillips Auditorium 60 Garden St., CfA 
 Tutorials on AstroStatistics and R
 Tutorial website


Group 4 Feb 2014 
 Projects


Giridhar Gopalan (Harvard) (joint work with Dr. Peter Plavchan) 11 Feb 2014 
 Removing "Systematic" Noise From Lightcurves
 Abstract:
In recent decades, the analysis of photometric time series data
(lightcurves) generated from widefield surveys has been fruitful
for the detection of exoplanets, amongst other transients. Unfortunately
these data are often dominated by "systematic" noise which is caused
by factors such as seeing conditions and instrumental effects and
simultaneously affects many of the lightcurves in a widefield
survey. We apply an implementation of the Trend Filtering Algorithm
(TFA) developed by Bakos and Kovacs to the 2MASS calibration catalog
and selected Palomar Transient Factory (PTF) photometric time series
data. In this method, a basis of lightcurves are chosen which
represent systematic noise well and noise is considered to be the
least squares projection onto this basis. Unsurprisingly, TFA is
successful at reducing the overall variability of lightcurves but
has a tendency to filter true signal due to overfitting when the
number of template lightcurves is large. To rectify these issues,
we modify TFA by including measurement uncertainties in its
computation, including ancillary data correlated with noise, and
choosing a template set using unsupervised learning approaches such
as Agglomerative Hierarchical and KMeans Clustering similar in
spirit to a paper by Kim et al. These modifications seem to reduce
the variability of lightcurves without overfitting. We conclude
by considering alternative routes to solving this problem, including
a hierarchical Bayesian approach which utilizes a wavelet basis to
explore the frequency domain, extending past work by Dr. Alex
Blocker.
This talk is based on work done under the supervision of Dr. Peter
Plavchan at IPAC a few summers ago.
 Presentation slides [.pdf]


Min Shandong (UC Irvine) 18 Feb 2014 
 Quantifying The Sensitivity of The Bayes Factor on The Choice of Prior Distribution in HighEnergy Astrophysical Analysis
 Abstract:
There is an important class of model selection problems in
astrophysics where the standard asymptotics of the likelihood ratio
test do not apply. Uncalibrated frequency based methods nonetheless
remain the standard approach among astronomers. This project will
study in detail the use of the Bayes Factor for emission line
detection in spectral analysis. We develop a method to quantify the
typically strong dependency of the Bayes Factor on the prior
distribution with the aim of identifying a tenable class of priors
under locationscale families where the Bayes Factor leads to a clear
choice among the possible models. We compare the results with those
obtained with posterior predictive pvalues and the traditional
likelihood ratio test. We will also talk about the efficiency and
accuracy of the available methods to calculate Bayes Factors and give
suggestions in the context of spectral analysis.
 Presentation slides [.pdf]


Randall Smith (CfA) 25 Feb 2014 
 Modeling Xray Spectra of Astrophysical Plasmas: Implications for Statistical Analysis
 Abstract:
Existing highresolution astrophysical Xray spectra has exposed the
need for highquality atomic data of all stripes: wavelengths,
collisional and absorption cross sections, and radiative rates. We
have created such a repository with AtomDB, a database of atomic
properties relevant to highresolution Xray spectroscopy. However,
the AstroH soft Xray spectrometer (2015 launch) will vastly increase
the number and type of highresolution Xray spectra available and
likely expose a number of shortcomings in our models. In addition,
the demand for meaningful uncertainties on atomic data has been
increasing over the last several years from those who rely on atomic
data in modeling astrophysical plasmas. Uncertainties are not only
critical in assessing the quality of the data, but can be propagated
through modeling codes to obtain uncertainties on diagnosed
quantities. I will discuss the current status and future plans for the
AtomDB database as well as invite dialogue about how the community's
need for practical measures of uncertainty can be addressed given our
current capabilities.
 Presentation slides [.pdf]


Andreas Zezas (Crete) 25 Mar 2014 
 Three Problems
 Abstract:
I will present three examples of common data analysis in
Astrophysics that could greatly benefit from more sophisticated
statistical methods.
a. The first example lies in the area of model selection.
Two dimensional fits to images of galaxies are a powerful tool
for classifying them and measuring the contribution of their
different stellar components. One of the challenges in this
process is to select between different nonnested models that
provide fits of similar quality. This is further complicated by
morphological components such as spiral arms that are hard to
model.
b. The secod example is in the area of source
classification. Classification of both stars and activity in
galaxies is based on empirical diagnostic schemes. I will discuss
the need for a more quantitative classification schemes, and I
will describe potential methods and challenges.
c. The third problem is related to joint fits of datasets of
different sizes. One example is joint fits of spectral energy
distributions and high resolution spectra that together can
better constrain the emission mechanisms and the parameters of
the gas in galaxies. The large difference in the size of the two
datasets hampers the use of standard fitting methods.
 Presentation slides: [pdf] ; [pptx]


David van Dyk (Imperial) 28 Mar 2014 11am1pm Pratt Conference Room 60 Garden St., CfA 
 Lecture on Markov Chain Monte Carlo
 Includes background information on MCMC methods,
description of practical challenges and advice,
and overview of recommended strategy.
It is intended as a tutorial.
 Slides [.pdf]
 Note: We tried to stream this live on YouTube, but it failed for reasons as yet unknown. Our apologies to everyone who tried to watch it online.


Tak Hyungsuk (Harvard) 1 Apr 2014 
 Parametric Bayesian Approach to Time Delay Estimation
 Abstract: Light rays from a source (eg. quasar)
can take different paths toward the earth due to the
gravitational fields of intervening matter. The arrival times
of these light rays vary depending on the lengths of paths,
and these differences in arrival time are called time delays.
In this talk, I deal with the simplest case based on two light
curves, one of which lags behind the other, and suggest a full
Bayesian model based on OrnsteinUhlenbeck process to obtain
the posterior distribution of the time delay. Various
gridsearch methods (eg. Chisquare minimization) to find
optimal time delay estimate have been dominating this field,
though computationally inefficient. In this sense, this
parametric Bayesian approach is promising because it turns the
timeconsuming optimization problem into a simple sampling
problem with a fast and stable Gibbs sampling scheme. Two real
data examples will be used to show these points.
 Slides [.pdf]


Bozena Czerny (Copernicus, Warsaw) 8 Apr 2014 
 Reverberation of highredshift quasars and its application to trace the dark energy
 Abstract:
Dark energy is one of the most puzzling issues in astronomy and
physics. Therefore, we need a number of independent methods to
establish its properties, since every method can have systematic
bias. One of the new methods is to use quasars instead of SN Ia.
In quasars there is a physical link between the timescale and the
absolute luminosity, so the key point in quasar application is to
determine precisely the time delay between the continuum and the
emission line in the optical band. Such reverberation is rutinely
done for nearby objects but it is difficult for distant quasars
due to lower S/N ratio and intrinsic variability amplitude in
timescales of years. In Czerny et al. (2013) we showed the
simulations of the campaigne done at present with 11m SALT
telescope, and we tried three time delay methods (Chi2 fitting,
ICCF and ZDCF) but we now see that the time gaps are larger than
expected so more sophisticated method is needed. Alternative
source of data (LAMOST) will provide much better time covering
but the quality of a single measurement is much worse than for
dedicated spectroscopic monitoring with SALT which also calls for
more advanced statistical approach.
 Slides [.pdf]


David Jones (Harvard) 15 Apr 2014 HU Stats Graduate Talks Series 12:30pm SciCen 705 
 Overlapping Astronomical Sources
 Abstract: Astronomical sources are often
situated close enough together that they cannot be fully
resolved instrumentally and it is of interest to infer the
number of individual sources, their locations, and their
respective intensities. Typically approximate source regions
are identified and then spatial analysis is performed, this is
later followed by a sepa rate spectral analysis. We instead
suggest an approach which jointly infers spatial and spectral
parameters. Convolving a number of sources with the point
spread function (PSF) results in a spatial finite mixture
model. We incorporate spectral models, back ground
contamination, and a latent Poisson process for the number and
positions of the sources. The resulting multilevel model is
fit with reversible jump Markov chain Monte Carlo (RJMCMC) to
obtain posterior distributions for the number of sources and
their individual parameters. Sensitivity to the prior
distribution on the number of sources is low due to our
knowledge of the PSF. Results for a simulation study and two
real datasets will be discussed. The first dataset is from
Chandra and suggests that the spectral model provides some
protection against chance variation in PSF or back ground.
The second is from the XMM observatory and others some hope
for separating two previously inseparable spectral
distributions.
 Slides [.pdf]


Keli Liu (Harvard) 29 Apr 2014 
 Prior and Prejudice: An Algorithm for Weakening Prior Influence
 Abstract:
Prejudice leads to bias. A Bayesian prior favoring
select models over others can pervert the data to fit our
prior worldviewthis favoritism runs against the spirit of
scientific objectivity. As Bayesian methods become more
popular, we need to protect ourselves against the often subtle
ways that a prior, chosen for convenience, can deleteriously
influence posterior inferences. We develop a formal diagnostic
to assess how prejudiced a prior is (and for which models). We
then exploit this diagnostic to remove the favoritism that a
prior exhibits. The result is our prior weakening algorithm:
an astronomer begins with a prior chosen solely for
convenience (this prior may be highly prejudiced) and
iteratively applies our algorithm until it becomes "fair".
Posterior inferences under this fair prior are driven by the
data and not by artifacts in the prior. Such inferences are
hence trustworthy. There is a large literature on prior
construction often involving ad hoc (and intractable)
derivations in any given problemthe practitioner is left with
few options if the literature has not suggested a prior for
their model (usually the case). Our prior weakening algorithm
replaces analytic derivation (manual and expensive) with
computation (automated and cheap)placing the ball in the
practitioner's court. If you're wondering whether a chosen
prior is harboring hidden biases, just plug it into the prior
weakening algorithm and turn the crank.
(Joint work with XiaoLi Meng and Natesh Pillai)


Andreas Zezas (Crete) 6 May 2014 12:15 pm EDT 
 Three and a Half Problems
 Abstract:
This is the second part of the discussion of three problems that started on the March 25th presentation.
The focus will be in the presentation of a model selection
problem in twodimensional fits to images of galaxies. These
fits are a powerful tool for classifying them and measuring
the contribution of their different stellar components. One of
the challenges in this process is to select between different
nonnested models that provide fits of similar quality. This
is further complicated by morphological components such as
spiral arms that are hard to model.
Also following the discussion of the classification of stars,
I will present a similar problem on the classification of
activity in galaxies based on measurements of their spectral
emission lines.
Finally, if time permits, we can continue the discussion on
the other two problems presented in the first part of this
talk, namely fitting of datasets of different sizes and
resolution, and classification of stellar spectra.




