Archive for the ‘Stat’ Category.

Quote of the Week, Aug 23, 2007

These are from two lively CHASC discussions on classification, or cluster analysis. The first was on Feb 7, 2006; the continuation on Dec 12, 2006, at the Harvard Statistics Department, as part of Stat 310 .

David van Dyk:

Don’t demand too much of the classes. You’re not going to say that all events can be well-classified…. It’s more descriptive. It gives you places to look. Then you look at your classes.

Xiao Li Meng:

Then you’re saying the cluster analysis is more like -

David van Dyk:

It’s really like you have a propsal for classes. You then investigate the physical processes more thoroughly. You may have classes that divide it [up]

……

David van Dyk:

But it can make a difference, where you see the clusters, depending on your [parameter] transformation.You can squish the white spaces, and stretch out the crowded spaces; so it can change where you think the clusters are.

Aneta Siemignowska:

But that is interesting.

Andreas Zezas:

Yes, that is very interesting.

These are particularly in honor of Hyunsook Lee‘s recent posting of Chattopadhyay et. al.’s new work about possible intrinsic classes of gamma-ray bursts. Are they really physical classes — or do they only appear to be distinct clusters because we view them through the “squished” lens (parameter spaces) of our imperfect instruments?

[ArXiv] Isochrone database, Aug. 20, 2007

From arxiv/astro-ph:0708.1204v3
An Isochrone Database and a Rapid Model for Stellar Population Synthesis by Li and Han

This paper emphasize the binary population: CMD fitting with the binary population synthetic model outperformed to the single population model. They used Hurley code (Hurley, Tout, and Pols (2002). Evolution of binary stars and the effect of tides on binary populations, MNRAS, 329(4), p.897-928). They mentioned that two color-color grids can disentangle the age-metallicity degeneracy via binary stellar populations. They fitted their isochrone database to M67 and NGC 1868 with the gT-grid and concluded that the database of binary stellar populations fitted the color magnitude diagrams better.
Continue reading ‘[ArXiv] Isochrone database, Aug. 20, 2007’ »

Cross-validation for model selection

One of the most frequently cited papers in model selection would be An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion by M. Stone, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1 (1977), pp. 44-47.
(Akaike’s 1974 paper, introducing Akaike Information Criterion (AIC), is the most often cited paper in the subject of model selection).
Continue reading ‘Cross-validation for model selection’ »

An alternative to MCMC?

I think of Markov-Chain Monte Carlo (MCMC) as a kind of directed staggering about, a random walk with a goal. (Sort of like driving in Boston.) It is conceptually simple to grasp as a way to explore the posterior probability distribution of the parameters of interest by sampling only where it is worth sampling from. Thus, a major savings from brute force Monte Carlo, and far more robust than downhill fitting programs. It also gives you the error bar on the parameter for free. What could be better? Continue reading ‘An alternative to MCMC?’ »

[ArXiv] Data-Driven Goodness-of-Fit Tests, Aug. 1, 2007

From arxiv/math.st:0708.0169v1
Data-Driven Goodness-of-Fit Tests by L. Mikhail

Goodness-of-Fit tests have been essential in astronomy to validate the chosen physical model to observed data whereas the limits of these tests have not been taken into consideration carefully when observed data were put into the model for estimating the model parameters. Therefore, I thought this paper would be helpful to have a thought on the different point of views between the astronomers’ practice of goodness-of-fit tests and the statisticians’ constructing tests. (Warning: the paper is abstract and theoretical.)
Continue reading ‘[ArXiv] Data-Driven Goodness-of-Fit Tests, Aug. 1, 2007’ »

[ArXiv] Poisson Mixture, Aug. 16, 2007

From arxiv/math.st:0708.2153v1
Estimating the number of classes by Mao and Lindsay

This study could be linked to identifying the number of lines from Poisson nature x-ray count data, one of the key interests for astronomers. However, as pointed by the authors, estimating the numbers of classes is a difficult statistical problem. I.J.Good[1] said that

I don’t believe it is usually possible to estimate the number of species, but only an appropriate lower bound to that number. This is because there is nearly always a good chance that there are a very large number of extremely rare species.

Continue reading ‘[ArXiv] Poisson Mixture, Aug. 16, 2007’ »

  1. courtesy of the paper: Estimating the number of species: A review by Bunge and Fitzpatrick (1993), JASA, 88, 364-373.[]

[ArXiv] Gamma-ray albedo of the moon, Aug. 15, 2007

From arxiv/astro-ph:0705.3856
Gamma-ray albedo of the moon by Moskalenko and Porter

The title sounds very interesting although the significance of albedo spectra is not recognized by a statistician. This study was performed to utilize GLAST and PAMELA via Monte Carlo simulations (the toolkit for MC was GEANT 8.2) with EGRET data.

Coverage issues in exponential families

I’ve been heard so much, without knowing fundamental reasons (most likely physics), about coverage problems from astrostat/phystat groups. This paper might be an interest for those: Interval Estimation in Exponential Families by Brown, Cai,and DasGupta ; Statistica Sinica (2003), 13, pp. 19-49

Abstract summary:
The authors investigated issues in interval estimation of the mean in the exponential family, such as binomial, Poisson, negative binomial, normal, gamma, and a sixth distribution. The poor performance of the Wald interval has been known not only for discrete cases but for nonnormal continuous cases with significant negative bias. Their computation suggested that the equal tailed Jeffreys interval and the likelihood ratio interval are the best alternatives to the Wald interval. Continue reading ‘Coverage issues in exponential families’ »

Astrostatistics: Goodness-of-Fit and All That!

During the International X-ray Summer School, as a project presentation, I tried to explain the inadequate practice of χ^2 statistics in astronomy. If your best fit is biased (any misidentification of a model easily causes such bias), do not use χ^2 statistics to get 1σ error for the 68% chance of capturing the true parameter.

Later, I decided to do further investigation on that subject and this paper came along: Astrostatistics: Goodness-of-Fit and All That! by Babu and Feigelson.
Continue reading ‘Astrostatistics: Goodness-of-Fit and All That!’ »

[Quote] Model Skeptics

From IMS Bulletin Vol. 36(3), p.11, Terence’s Stuff: Model skeptics

[Once I quoted an article by Prof. Terry Speed in IMS Bulletin: Data-Doctors. Reading his columns in the IMS Bulletin provides me an opportunity to reflect who I am as a statistician and some guidance for treating data. Although his ideas were not from astronomy or astronomical data analysis, I often find his thoughts and words can be shared with astronomers.]
Continue reading ‘[Quote] Model Skeptics’ »

Change Point Problem

X-ray summer school is on going. Numerous interesting topics were presented but not much about statistics (Only advice so far, “use implemented statistics in x-ray data reduction/analysis tools” and “it’s just a tool”). Nevertheless, I happened to talk two students extensively on their research topics, finding features from light curves. One was very empirical from comparing gamma ray burst trigger time to 24kHz observations and the other was statistical and algorithmic by using Bayesian Block. Sadly, I could not give them answers but the latter one dragged my attention.
Continue reading ‘Change Point Problem’ »

“They let you in now?”

Much to everybody’s surprise, they let some astronomers into the recently concluded Joint Statistical Meeting at Salt Lake City, UT. There were two three astrostat sessions: [#45 on Probing the Universe with Nonparametric Methods,] #367 on Bayesian Applications in Astronomy and Physics (chaired by David van Dyk), and #411 on Image Analysis in Solar- and Astro-physics (chaired by Yaming Yu and Thomas Lee). Both [of the latter] sessions were dominated by presentations from CHASC collaborators.

[ArXiv] Geneva-Copenhagen Survey, July 13, 2007

From arxiv/astro-ph:0707.1891v1
The Geneva-Copenhagen Survey of the Solar neighborhood II. New uvby calibrations and rediscussion of stellar ages, the G dwarf problem, age-metalicity diagram, and heating mechanisms of the disk by Holmberg, Nordstrom, and Andersen

Researchers, including scientists from CHASC, working on color magnitude diagrams to infer ages, metalicities, temperatures, and other physical quantities of stars and stellar clusters may find this paper useful.
Continue reading ‘[ArXiv] Geneva-Copenhagen Survey, July 13, 2007’ »

Quote of the Week, July 26, 2007

Peter Bickel:

“Bayesian” methods have, I think, rightly gained favor in astronomy
as they have in other fields of statistical application. I put “Bayesian” in quotation marks because I do not believe this marks a revival in the sciences in the belief in personal probability. To me it rather means that all information on hand should be used
in model construction, coupled with the view of Box[1979 etc], who considers himself a Bayesian:

Models, of course, are never true but fortunately it is only necessary that they be useful.

The Bayesian paradigm permits one to construct models and hence statistical methods which reflect such information in an, at least in principle, marvellously simple way. A frequentist such as myself feels as at home with these uses of Bayes principle
as any Bayesian.

From Bickel, P. J. “An Overview of SCMA II”, in Statistical Challenges in Modern Astronomy II, editors G. Jogesh Babu and Eric D. Feigelson, 1997, Springer-Verlag, New York,p 360.

[Box 1979] Box, G. E. P. , 1979, “Some Problems of statistics and everyday life”. J. Amer. Statst. Assoc., 74, 1-4.

Peter Bickle had so many interesting perspectives in his comments at these SCMA conferences that it was hard to choose just one set.

[ArXiv] Three Classes of GRBs, July 21, 2007

From arxiv/astro-ph:0705.4020v2
Statistical Evidence for Three classes of Gamma-ray Bursts by T. Chattopadhyay et. al.

In general, gamma-ray bursts (GRBs) are classified into two groups: long (>2 sec) and short (<2 sec) duration bursts. Nonetheless, there have been some studies including arxiv/astro-ph:0705.4020v2 that statistically proved the optimal existence of 3 clusters. The pioneer work of GRB clusterings was based on hierarchical clustering methods by Mukerjee et. al.(Three Types of Gamma-Ray Bursts)
Continue reading ‘[ArXiv] Three Classes of GRBs, July 21, 2007’ »