Presentations 

Alanna Connors (with Aneta and Vinay) 08 Sep 2009 
 Introduction to Astronomy for Statisticians
 [.pdf]

 Movies:
 Full Sun rotating (Hinode/XRT) [.mov]
 flaring loops (Hinode/XRT) [.mov]
 transit of Mercury (Hinode/XRT) [.mov]
 Gammaray sky (Fermi) [.m4v]
 Black Hole at center of Milky Way (ESO/VLT) [.m4v]
 Discovery of Kuiper Belt object (APOD) [.gif]


Brandon Kelly (CfA) 06 Oct 2009 
 Hierarchical modeling of astronomical images and uncertainty in truncated data sets
 Abstract:
I will discuss two astronomical problems that I am working on, and
the statistical issues surrounding them. The first involves the
analysis of brightness images of astronomical objects with the goal
of recovering an 'image' of the physical properties of these objects.
Here, the primary goal is to infer from the brightness images how
the physical properties of the objects are correlated and spatially
distributed. The analysis is complicated by a twolevel error
structure, having both additive and multiplicative errors, making
some of the model parameters nearly degenerate. The second problem
involves density estimation of a truncated data set, where the
truncation arises due to a data selection efficiency that varies
with an astronomical object's brightness (e.g., fainter things are
more difficult to detect). When the selection efficiency is known,
the analysis is straightfoward. However, when the selection efficiency
has some uncertainty in it, the likelihood function or posterior
distribution can be unstable. Currently, there does not appear to
be methods for accounting for uncertainty in the data selection
efficiency.
 Presentation [.pdf]


Nathan Stein, Paul Baines 20 Oct 2009 
 Markov Chain Monte Carlo Methods for Fitting Computer Models for Stellar Evolution (in three parts)
 Nathan Stein (Dept. of Statistics, Harvard U)
 Abstract:
Bayesian analysis of the evolution of star clusters presents
several computational challenges. Because physicsbased models of
stellar evolution are implemented as computer models and are not
available in closed form, none of the conditional posterior
distributions are traditional named distributions. Moreover, the
posterior distributions of interest are high dimensional, strongly
correlated, and often multimodal. Markov chain Monte Carlo algorithms
can generate samples from these posterior distributions, but creating
reasonably efficient sampling algorithms requires advanced techniques.
 slides [.pdf]
 Paul Baines (Dept. of Statistics, Harvard U)
 Abstract:
The analysis of photometric data for stellar clusters provides an
example of both the statistical and computational challenges present
in many Astronomy applications. Typically, properties of stellar
clusters are estimated using ColorMagnitude Diagrams, whereby the
observed data are often simply compared to what one would expect under
a theoretical mapping from the model parameters to the observed data.
This mapping is determined by a set of isochrone tables, listing the
expected photometric measurements for a given set of input parameters.
To address many of the substantive questions in a coherent statistical
manner, we present a flexible hierarchical Bayesian model for the
analysis of stellar populations. The computation for the model is done
via Markov Chain Monte Carlo (MCMC), the standard tool of choice for
Bayesian computation. Both the complex dependence structure and the
peculiar nature of the isochrone mapping, however, present a
formidable challenge to standard statistical computation methods. In
the spirit of the Ancillary Sufficient Interweaving Scheme (ASIS) of
Yu & Meng (presented in a separate talk by Yu), we show how competing
parameterizations can be constructed and combined to help overcome the
weaknesses of individual schemes, and drastically improve the
efficiency and reliability of the computation.
 slides [.pdf]



David Stenning / Jin Xu / Alex Blocker 3 Nov 2009 
 David Stenning (UCI)
 Automatic Classification of Sunspot Groups Using SOHO/MDI Magnetogram and WhiteLight Images

Abstract:
Sunspot groups are classified into four types: alpha, beta, betagamma,
and betagammadetla. Currently, most sunspot group classification is
done manually by experts. This is a lengthy, laborintensive, and somewhat
subjective process, necessitating the need for an automatic and accurate
procedure. We intend to use SOHO/MDI magnetogram and whitelight images
to detect and classify sunspots into the appropriate group. The first
step in this process involves the extraction of white light data that
corresponds to the magnetogram images we have available. I will discuss
the progress I have made so far and address questions regarding the
automation of the extraction routine.
 presentation slides [.pdf]

 Jin Xu (UCI)
 Solar DEMs
 Abstract:
The wavelength distribution of light emitted from different regions of
the sun contains clues as to how the composition and temperature of the
sun varies across its surface. Decoding this information requires sophisticated
statistical techniques and detailed quantum physical calculations. Data
consists of images of the sun that record its intensity in each of a number
of wavelength bands. This "talk" will consist of a conversation about how
best to formulate the model to leverage both the data and quantum physics
to best understand composition and temperature images of the sun.
 presentation slides [.pdf]

 Alex Blocker (HU)
 Event Detection in Time Series Databases with Robust Wavelet Model
 presentation slides [.pdf]



Paul Baines, Yaming Yu 17 Nov 2009 
 Markov Chain Monte Carlo Methods for Fitting Computer Models for Stellar Evolution (part two)
 Paul Baines (Dept. of Statistics, Harvard U)
 contd.

 Yaming Yu (Dept. of Statistics, UCI)
 Abstract:
The importance of a good parameterization for efficient MCMC
implementation has been repeatedly emphasized in the literature. For
a broad class of multilevel models, there exist two wellknown
competing parameterizations: the centered parameterization and the non
centered parameterization. We describe a surprisingly general and
powerful strategy for boosting MCMC efficiency by simply interweaving
but not alternatingthe two parameterizations. A Poisson time
series model for detecting changes in source intensity of photon
counts is used to illustrate the effectiveness of this strategy.


Victoria Liublinska (HU), Jin Xu
(UCI), Jing Liu (UCI) 1 Dec 2009 
 Accounting for Missing Lines in Atomic Emissivity Databases Using DEM
Analysis with Highresolution Xray Spectra (VL)
 Abstract:
Access to substantial amount of data in the highenergy range gives us an
opportunity to extend our knowledge of stellar coronal composition and
temperature structure by analyzing the entire spectrum as a whole. Moreover,
data from detectors with high spectral resolution will provide additional
constraints on atomic data measurements being conducted in laboratories on
the ground. In particular, the best atomic emissivity databases created by
physicists still have missing, misplaced or poorly estimated lines and the
goal of our analysis is to provide ways of identifying lines that were
omitted and improve our estimates of stellar Differential Emission Measure
and plasma abundance by incorporating the information about them.
 Presentation [.pdf]

 In addition Jin Xu will discuss Solar DEM reconstruction from
photometric images and Jing Liu will present an update on Xray Image
Analysis of Quasar Jets.


Meng Xiaoli (Harvard)
Tulun Ergin (CfA)
26 Jan 2010 
 [XLM] A Statistician's View of Upcoming Grand Challenges in Astronomy
A reimputation of an imputed talk given at the January Meeting of the American Astronomical Society in Washington DC
 Abstract:
There is a broad spectrum of astrostatistical challenges, in this
age of huge, complex, and computerintensive models, data,
instruments, and questions. These challenges bridge astronomy at
many wavelengths; basic physics; machine learning;  and statistics.
At one end of our spectrum, we think of 'compressing' the data with
nonparametric methods. This raises the question of creating
'pseudoreplicas' of the data for uncertainty estimates. What would
be involved in, e.g. bootstrap and related methods? Somewhere in
the middle are these nonparametric methods for encapsulating the
uncertainty information. At the far end, we find more modelbased
approaches, with the physics model embedded in the likelihood and
analysis. The other distinctive problem is really the 'blackbox'
problem, where one has a complicated e.g. fundamental physicsbased
computer code, or 'black box', and one needs to know how changing
the parameters at input  due to uncertainties of any kind  will
map to changing the output. All of these connect to challenges in
complexity of data and computation speed. Dr. Meng will highlight
ways to 'cut corners' with advanced computational techniques, such
as Parallel Tempering and Equal Energy methods. As well, there are
cautionary tales of running automated analysis with real data 
where "30 sigma" outliers due to data artifacts can be more common
than the astrophysical event of interest.
 AAS Presentation [.ppt]

 Extended Sources in TeV and GeV Energies
 Presentation: [pdf]


Alex Blocker
David Stenning
Jin Xu
09 Feb 2010 
 [DS]  sunspot classification
 [pdf]

 [JX]  solar DEM
 [pdf]

 [AB] Doing Right By Massive Data: How To Bring Probability Modeling
To The Analysis Of Huge Datasets Without Taking Over The Datacenter
 Abstract:
The analysis of extremely largescale complex datasets is
becoming an increasingly important task in the analysis of scientific
data. This trend is especially prevalent in astronomy, as largescale
surveys such as SDSS, PanSTARRS, and the LSST deliver (or promise to
deliver) unprecedented amounts of data. While both the statistics and
machinelearning communities have offered approaches to these
problems, neither has produced a satisfactory approach. Statistical
solutions are typically rigorous and wellmotivated but do not scale
well to massive datasets, whereas machine learning solutions typically
lack statistical rigor and fail to account for the nuances of the
scientific problem at hand. I will discuss an approach for combining
much of the power of probability modeling with the scalability of more
adhoc machine learning approaches in the context of an event
detection problem for massive collections of time series. I will also
provide comments on the assessment of uncertainty in this context and
some general remarks on "using all of your tools, but in the right
order," as a much pithier writer once said.
 Presentation [.pdf]


Aneta Siemiginowska 23 Feb 2010 
 Testing Radiation Models of Young Radio Sources
 Abstract: Models of young radio sources predict that significant fraction of
their energy should be radiated in Xrays and gammarays. Recent
Chandra and Fermi/LAT observations can be used to constrain the
theoretical models, determine energetics of young sources and their
contribution to the background radiation. In my talk I review the
current data and describe challenges in the statistical analysis of
the Fermi data. The main goal in the future studies is to develop a
full statistical model to evaluate the gammaray flux of young radio
sources, verify theoretical models predicting high energy emission, and
a distribution of the young radio source population in gammarays.
 Presentation [.pdf]

 [Jin Xu] Solar DEM
 [.pdf]


Don Richard (Penn State) 23 Mar 2010 
 Maximum Likelihood Estimation and the Bayesian Information Criterion
 Abstract:
The talk will introduce the method of maximum likelihood in the
context of problems in astrophysics and make several applications.
We examine in detail the problem of fitting competing statistical
models to the luminosity functions of globular clusters. We shall
see that the Bayesian Information Criterion leads to a conclusion
that the Gaussian model is to be preferred over the tdistribution
model for GCLF in the Milky Way. Finally, we will discuss some
open research problems and opportunities in the area.
 [pdf]

 Statistical Inference with Monotone Incomplete Multivariate Normal Data
 Abstract:
We consider problems in statistical inference with twostep, monotone
incomplete data drawn from a multivariate normal population. We
derive stochastic representations for the exact distributions of
the maximum likelihood estimators of the population mean vector and
covariance matrix and deduce a wide collection of results for
inference on the mean vector, including: lower bounds on the level
of confidence associated with ellipsoidal confidence regions for
the mean, confidence regions for linear combinations of the components
of the mean, and unbiasedness results for several testing problems
on the mean vector and covariance matrix. With regard to problems
of shrinkage estimation for the mean, we extend to the case of
monotone incomplete samples a wide class of classical results on
the reduced risk of estimators of JamesStein type. In testing for
multivariate normality of monotone incomplete data, we construct
Mardiatype statistics for testing kurtosis and skewness, and derive
their asymptotic distributions. If time permits then we will provide
an application to a wellknown cholesterol data set featured in the
Minitab Handbook.
 [pdf]


Alex Blocker 06 Apr 2010 
 Doing Right By Massive Data: Using Probability Modeling To Advance The
Analysis Of Huge Astronomical Datasets
 Abstract:
The analysis of extremely large, complex datasets is becoming an
increasingly important task in the analysis of scientific data.
This trend is especially prevalent in astronomy, as large scale
surveys such as SDSS, EROS, PanSTARRS, and the LSST deliver (or
promise to deliver) terabytes of data per night. While both the
statistics and machinelearning communities have offered approaches
to these problems, neither has produced a completely satisfactory
approach. Working in the context of event detection for the MACHO
LMC data, I will present an approach that combines much of the power
of Bayesian probability modeling with the the efficiency and
scalability typically associated with more adhoc machine learning
approaches. This provides both rigorous assessments of uncertainty
and improved statistical efficiency on a dataset containing
approximately 20 million sources and 40 million individual time
series. I will also discuss how this framework could be extended
to related problems.
 rehearsal for NESS: [pdf]


Xie Xianchao 20 Apr 2010 
 Dust Temperature and Spectral Index Correlation?
 Abstract:
Recent advances in infrared and submillimeter technologies have
allowed observations to be made on the dust emission in a variety
of environments. Applying the spectral energy distribution fitting
to the flux measurements, several independent research groups have
concluded that there exists an inverse correlation between dust
temperature and dust emissivity spectral index. However, it is also
suspected that the empirical correlation might have been caused by
the noise in the measurements, as illustrated in Shetty et al (2009).
This talk discusses how Bayesian models can be possibly employed
to address such an issue. MCMC methods, especially Gibbs samplers
are used to conduct the posterior inference. Specific challenge in
designing a fast convergence Gibbs chain is also noted.
 [pdf]


Xu Jin 25 May 2010 
 Solar DEM reconstruction
 [.pdf]


Victoria Liublinska 15 Jun 2010 
 Reconstructing stellar DEM and metallicity using highresolution Xray Spectra
 [.pdf]


Jing 22 Jun 2010 
 Deconvolution of Quasar Xray Images Using the EM Method






Fall/Winter 20042005
Siemiginowska, A. / Connors, A. / Kashyap, V. / Zezas, A. / Devor, J. / Drake, J. / Kolaczyk, E. / Izem, R. / Kang, H. / Yu, Y. / van Dyk, D. 
Fall/Winter 20052006
van Dyk, D. / Ratner, M. / Jin, J. / Park, T. / CCW / Zezas, A. / Hong, J. / Siemiginowska, A. & Kashyap, V. / Meng, X.L. 
Fall/Winter 20062007
Lee, H. / Connors, A. / Protopapas, P. / McDowell, J., / Izem, R. / Blondin, S. / Lee, H. / Zezas, A., & Lee, H. / Liu, J.C. / van Dyk, D. / Rice, J.

Fall/Winter 20072008
Connors, A., & Protopapas, P. / Steiner, J. / Baines, P. / Zezas, A. / Aldcroft, T.

Fall/Winter 20082009
H. Lee /
A. Connors, B. Kelly, & P. Protopapas /
P. Baines /
A. Blocker /
J. Hong /
H. Chernoff /
Z. Li /
L. Zhu (Feb) /
A. Connors (Pt.1) /
A. Connors (Pt.2) /
L. Zhu (Mar) /
E. Kolaczyk /
V. Liublinska /
N. Stein

Fall/Winter 20092010
A.Connors /
B.Kelly /
N.Stein, P.Baines /
D.Stenning / J. Xu / A.Blocker /
P.Baines, Y.Yu /
V.Liublinska, J.Xu, J.Liu /
X.L. Meng, et al. /
A. Blocker, et al. /
A. Siemiginowska /
D. Richard /
A. Blocker /
X. Xie /
X. Jin /
V. Liublinska /
L. Jing
