Phillips Auditorium, CfA,

60 Garden St., Cambridge, MA 02138

URL: http://hea-www.harvard.edu/AstroStat/CAS2010

]]>The California-Boston-Smithsonian Astrostatistics Collaboration plans to host a mini-workshop on Computational Astro-statistics. With the advent of new missions like the Solar Dynamic Observatory (SDO), Panoramic Survey and Rapid Response (Pan-STARRS) and Large Synoptic Survey (LSST), astronomical data collection is fast outpacing our capacity to analyze them. Astrostatistical effort has generally focused on principled analysis of individual observations, on one or a few sources at a time. But the new era of data intensive observational astronomy forces us to consider combining multiple datasets and infer parameters that are common to entire populations. Many astronomers really want to use every data point and even non-detections, but this becomes problematic for many statistical techniques.

The goal of the Workshop is to explore new problems in Astronomical data analysis that arise from data complexity. Our focus is on problems that have generally been considered intractable due to insufficient computational power or inefficient algorithms, but are now becoming tractable. Examples of such problems include: accounting for uncertainties in instrument calibration; classification, regression, and density estimations of massive data sets that may be truncated and contaminated with measurement errors and outliers; and designing statistical emulators to efficiently approximate the output from complex astrophysical computer models and simulations, thus making statistical inference on them tractable. We aim to present some issues to the statisticians and clarify difficulties with the currently used methodologies, e.g. MCMC methods. The Workshop will consist of review talks on current Statistical methods by Statisticians, descriptions of data analysis issues by astronomers, and open discussions between Astronomers and Statisticians. We hope to define a path for development of new algorithms that target specific issues, designed to help with applications to SDO, Pan-STARRS, LSST, and other survey data.

We hope you will be able to attend the workshop and present a brief talk on the scope of the data analysis problem that you confront in your project. The workshop will have presentations in the morning sessions, followed by a discussion session in the afternoons of both days.

If you do not mind the **time** predictor, it is hard to believe that this is a light curve, time dependent data. At a glance, this data set represents a simple block design for the **one-way ANOVA**. ANOVA stands for **Analysis of Variance,** which is not a familiar nomenclature for astronomers.

Consider a case that you have a very long strip of land that experienced FIVE different geological phenomena. What you want to prove is that crop productivity of each piece of land is different. So, you make FIVE beds and plant same kind seeds. You measure the location of each seed from the origin. Each bed has some dozens of seeds, which are very close to each other but their distances are different. On the other hand, the distance between planting beds are quite far unable to say that plants in the test bed A affects plants in B. In other words, A and B are independent suiting for my statistical inference procedure by the F-test. All you need is after a few months, measuring the total weight of crop yield from each plant (with measurement errors).

Now, let’s look at the plot above. If you replace distance to time and weight to flux, the pattern in data collection and its statistical inference procedure matches with the one-way ANOVA. It’s hard to say this data set is designed for time series analysis apart from the complication in statistical inference due to measurement errors. How to design the statistical study with measurement errors, huge gaps in time, and unequal time intervals is complex and unexplored. It depends highly on the choice of inference methods, assumptions on error i.e. likelihood function construction, prior selection, and distribution family properties.

Speaking of ANOVA, using the F-test means that we assume residuals are Gaussian from which one can comfortably modify the model with additive measurement errors. Here I assume there’s no correlation in measurement errors and plant beds. How to parameterize the measurement errors into model depends on such assumptions as well as how to assess sampling distribution and test statistics.

Although I know this Crab nebula data set is not for the one-way ANOVA, the pattern in the scatter plot drove me to test the data set. The output said to reject the null hypothesis of statistically equivalent flux in FIVE time blocks. The following is R output without measurement errors.

Df Sum Sq Mean Sq F value Pr(>F)

factor 4 4041.8 1010.4 143.53 < 2.2e-16 ***

Residuals 329 2316.2 7.0

If the gaps are minor, I would consider time series with missing data next. However, the missing pattern does not agree with my knowledge in missing data analysis. I wonder how astronomers handle such big gaps in time series data, what assumptions they would take to get a best fit and its error bar, how the measurement errors are incorporated into statistical model, what is the objective of statistical inference, how to relate physical meanings to statistical significant parameter estimates, how to assess the model choice is proper, and more questions. When the contest is over, if available, I’d like to check out any statistical treatments to answer these questions. I hope there are scientists who consider similar statistical issues in these data sets by the INTEGRAL team.

]]>First there was Edwin Hubble (S.B. 1910, Ph.D. 1917).

Then came Arthur Compton (the “MetLab”).

Followed by Subramanya Chandrasekhar (Morton D. Hull Distinguished Service Professor of Theoretical Astrophysics).

And now, Enrico Fermi.

]]>]]>Invitation to SLAC Summer Institute 2008

Dear Colleague,

We are writing to you about the 36th SLAC Summer Institute to be held Aug 4-15 this year on “Cosmic Accelerators”. This school was planned largely in anticipation of the GLAST mission, a highly successful collaboration involving astronomers and physicists from all around the world. (The proposal of its primary instrument, the Large Area Telescope or LAT, came from an international team led by Stanford, LAT integration and initial testing took place at SLAC, and LAT data are received at the Instrument Science Operation Center at SLAC for processing.) As you may have heard, GLAST was launched on June 11 and has been operating very well. It is planned to release the first-light information while the Institute is in session, and we anticipate the first important new science results over the next year, so this yearâ€™s Institute is very well timed. With its large leap in capabilities, GLAST will make breakthrough observations of many classes of high-energy cosmic sources and has a very large discovery window for signals of new phenomena, including indirect detection of dark matter.As two of the co-Directors of the Institute we believe that this will be an unusually timely opportunity for students, postdocs and seasoned researchers who wish to expand their research area (no background in astrophysics is required) and learn about the exciting science of GLAST as well as recent advances in X-ray and TeV astronomy and cosmic-ray physics. Accordingly we have extended the deadline for early registration till July 31.

Please pass on information about the Institute, which can be found at

http://www-conf.slac.stanford.edu/ssi/2008/default.asp

Sincerely,

Roger Blandford

Tune Kamae

- [astro-ph:0806.0650] Kimball and Ivezi\’c

**A Unified Catalog of Radio Objects Detected by NVSS, FIRST, WENSS, GB6, and SDSS**(The catalog is available HERE. I’m always fascinated with the possibilities in catalog data sets which machine learning and statistics can explore. And I do hope that the measurement error columns get recognition from non astronomers.) - [astro-ph:0806.0820] Landau and Simeone

**A statistical analysis of the data of Delta \alpha/ alpha from quasar absorption systems**(It discusses Student t-tests from which confidence intervals for unknown variances and sample size based on Type I and II errors are obtained.) - [stat.ML:0806.0729] R. Girard

**High dimensional gaussian classification**(Model based – gaussian mixture approach – classification, although it is often mentioned as clustering in astronomy, on multi- dimensional data is very popular in astronomy) - [astro-ph:0806.0520] Vio and Andreani

**A Statistical Analysis of the “Internal Linear Combination” Method in Problems of Signal Separation as in CMB Observations**(Independent component analysis, ICA is discussed) - [astro-ph:0806.0560] Nobel and Nowak

**Beyond XSPEC: Towards Highly Configurable Analysis**(The flow of spectral analysis with XSPEC and Sherpa has not been accepted smoothly; instead, it has been a personal struggle. It seems the paper considers XSPEC as a black box, which I completely agree with. The main objective of the paper is comparing XSPEC and ISIS) - [astro-ph:0806.0113] Casandjian and Grenier

**A revised catalogue of EGRET gamma-ray sources**(The maximum likelihood detection method, which I never heard from statistical literature, is utilized)

]]>The goal of this workshop is to bring together an international group of scientists interested in the future of gamma-ray astronomy to define the direction of future instruments and discuss R&D projects for the next generation observatories such as CTA, HAWC and AGIS. … While some time of the meeting will be devoted to summarise the scientific motivations for such instruments, drawing on the White Paper for the Division of Astrophysics of the American Physical Society, particular emphasis in the discussion will be devoted to technical challenges, design parameters and projected sensitivities of such future observatories.

The title sounds very interesting although the significance of albedo spectra is not recognized for a statistician. This study was performed to utilize GLAST and PAMELA via Monte Carlo simulations (the toolkit for MC was GEANT 8.2) with EGRET data.

]]>The title sounds very interesting although the significance of albedo spectra is not recognized by a statistician. This study was performed to utilize GLAST and PAMELA via Monte Carlo simulations (the toolkit for MC was GEANT 8.2) with EGRET data.

]]>Without an optical afterglow, a galaxy within the 2 arc second error region of a GRB x-ray afterglow is identified as a host galaxy; however confusion can rise due to the facts that 1. the edge of a galaxy is diffused, 2. multiple sources could exist within 2 arc second error region, 3.the distance between the galaxy and the x-ray afterglow is measured by projection, and 4. lensing causes increase of brightness and position shifts. In this paper, the authors “investigated the fields of 72 GRBs in order to examine the general issue of associations between GRBs and host galaxies.”

The authors added some statistical issues on this matching GRBs and host galaxies but current knowledge and techniques seem short to tackle the problem. Yet, to prevent false discovery, the authors proposed strategic studies for the followings:

- Gamma-ray luminosity indicators
- Detection (or non-detection ) SNe (Supernova) for long-duration bursts
- Classification of associated galaxy : long-duration and short-duration bursts are associated with late-type and early-type galaxies, respectively
- Optical afterglow spectral absorption features
- Visual detection of true host galaxy as happened with GRB 060912a
- X-ray afterglow spectral emission lines, and
- Strong lensing of x-ray afterglows

As multi-wavelength studies become popular nowadays, this source matching issue across bands continuously arises where statistics can contribute the validity of source matching methods. So far, those methods are incomprehensible to statisticians.

]]>Recently I went to JSM 2007 and tried to attend talks about (bayesian) change point problems, which frequently appears in time series models, often found in economics. With ARCH (autoregressive conditional heteroskedecity) or GARCH (generalized ARCH) and by adding a parameter indicates a change point, I thought bayesian modeling could handle astronomical light curves.

Developing algorithms based on statistical theories, writing algorithms down in a heuristics way, making the code public, and finding/processing proper datum examples from huge astronomical data archives should come simultaneously, and this multiple steps make proposing new statistics to astronomical society difficult. I’m glad to know that there are individuals who are devoting themselves to make these steps happened. Unfortunately, they are loners.

]]>In general, gamma-ray bursts (GRBs) are classified into two groups: long (>2 sec) and short (<2 sec) duration bursts. Nonetheless, there have been some studies including arxiv/astro-ph:0705.4020v2 that statistically proved the optimal existence of 3 clusters. The pioneer work of GRB clusterings was based on hierarchical clustering methods by Mukerjee et. al.(Three Types of Gamma-Ray Bursts)

The new feature of this article is that Chattopadhyay et. al. applied the k-means and the Dirichlet process model based clustering method to confirm three classes of GRBs. In addition, they investigated classes among 21 GRBs with known red shifts (Click for those GRBs).

]]>