Archive for October 2007

[ArXiv] An unbiased estimator, May 29, 2007

From arxiv/astro-ph:0705.4199v1
In search of an unbiased temperature estimator for statistically poor X-ray spectra
A. Leccardi and S. Molendi

There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay’s posting), I’d like to give a shot. I was very excited because I haven’t seen any astronomical papers discussing unbiased estimators solely.
Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »

[ArXiv] 4th week, Oct. 2007

I hope there are a paper or two drags your attentions and stimulates your thoughts in astrostatistics from arXiv.
Continue reading ‘[ArXiv] 4th week, Oct. 2007’ »

compressed sensing and a blog

My friend’s blog led me to Terrence Tao’s blog. A mathematician writes topics of applied mathematics and others. A glance tells me that all postings are well written. Especially, compressed sensing and single pixel cameras drags my attention more because the topic stimulates thoughts of astronomers in virtual observatory[1] and image processing[2] (it is not an exaggeration that observational astronomy starts with taking pictures in a broad sense) and statisticians in multidimensional applications, not to mention engineers in signal and image processing. Continue reading ‘compressed sensing and a blog’ »

  1. see the slog posting “Virtual Observatory”[]
  2. see the slog posting “The power of wavedetect”[]

The power of wavdetect

wavdetect is a wavelet-based source detection algorithm that is in wide use in X-ray data analysis, in particular to find sources in Chandra images. It came out of the Chicago “Beta Site” of the AXAF Science Center (what CXC used to be called before launch). Despite the fancy name, and the complicated mathematics and the devilish details, it is really not much more than a generalization of earlier local cell detect, where a local background is estimated around a putative source and the question is asked, is whatever signal that is being seen in this pixel significantly higher than expected? However, unlike previous methods that used a flux measurement as the criterion for detection (e.g., using signal-to-noise ratios as proxy for significance threshold), it tests the hypothesis that the observed signal can be obtained as a fluctuation from the background. Continue reading ‘The power of wavdetect’ »

Clay Public Lecture: Technology-driven Statistics

I found the following from Harvard Statistics department website.

Clay Public Lecture
Technology-driven Statistics
Terry Speed, UC Berkeley and WEHI in Melbourne, Australia
Tuesday, October 30, 2007, at 7:00 PM
Harvard University Sceince Center — Hall B

Continue reading ‘Clay Public Lecture: Technology-driven Statistics’ »

[ArXiv] 3rd week, Oct. 2007

Quite interesting papers were up at arXiv, including a theoretical statistics paper that no astronomer manages to miss. To find the paper and others, please click
Continue reading ‘[ArXiv] 3rd week, Oct. 2007’ »

~ Avalanche(a,b)

Avalanches are a common process, occuring anywhere that a system can store stress temporarily without “snapping”. It can happen on sand dunes and solar flares as easily as on the snow bound Alps.

Melatos, Peralta, & Wyithe (arXiv:0710.1021) have a nice summary of avalanche processes in the context of pulsar glitches. Their primary purpose is to show that the glitches are indeed consistent with an avalanche, and along the way they give a highly readable description of what an avalanche is and what it entails. Briefly, avalanches result in event parameters that are distributed in scale invariant fashion (read: power laws) with exponential waiting time distributions (i.e., Poisson).

Hence the title of this post: the “Avalanche distribution” (indulge me! I’m using stats notation to bury complications!) can be thought to have two parameters, both describing the indices of power-law distributions that control the event sizes, a, and the event durations, b, and where the event separations are distributed as an exponential decay. Is there a canned statistical distribution that describes all this already? (In our work modeling stellar flares, we assumed that b=0 and found that a>2 a<-2, which has all sorts of nice consequences for coronal heating processes.)

Quote of the Week, Oct 12, 2007

This is an unusual Quote-of-the-week, in that I point you to [ABSTRACT] and a [VIDEO] of the recent talk at the Institute for Innovative Computing. See what you think! Continue reading ‘Quote of the Week, Oct 12, 2007’ »

[ArXiv] 2nd week, Oct. 2007

Frankly, there was no astrostatistically interesting paper from astro-ph this week but profitable papers from the statistics side were posted. For the list, click Continue reading ‘[ArXiv] 2nd week, Oct. 2007’ »

you are biased, I have an informative prior”

Hyunsook drew attention to this paper (arXiv:0709.4531v1) by Brad Schaefer on the underdispersed measurements of the distances to LMC. He makes a compelling case that since 2002 published numbers in the literature have been hewing to an “acceptable number”, possibly in an unconscious effort to pass muster with their referees. Essentially, the distribution of the best-fit distances are much more closely clustered than you would expect from the quoted sizes of the error bars. Continue reading ‘“you are biased, I have an informative prior”’ »

[ArXiv] 1st week, Oct. 2007

This week, instead of only filtering AstroStatistics related papers from arxiv, I chose additional arxiv/astro-ph papers related to CHASC folks’ astrophysical projects. Some of papers you see from this week do not have sophisticated statistical analysis but contain data from specific satellites and possibly relevant information related to CHASC projects. Due to the CHACS’ long history (we are celebrating the 10th birthday this year) and my being a newbie to CHASC, I may not pick up all papers related to the projects of current, former, and future CHASC members and dedicated slog readers. For creating a satisfying posting every week, your inputs are welcome to improve my adaptive filter. For the list of this week, click the following.
Continue reading ‘[ArXiv] 1st week, Oct. 2007’ »

Implement Bayesian inference using PHP

Not knowing much about java and java applets in a software development and its web/internet publicizing, I cannot comment what is more efficient. Nevertheless, I thought that PHP would do the similar job in a simpler fashion and the followings may provide some ideas and solutions for publicizing statistical methods through websites based on Bayesian Inference.
Continue reading ‘Implement Bayesian inference using PHP’ »

model vs model

As Alanna pointed out, astronomers and statisticians mean different things when they say “model”. To complicate matters, we have also started to use another term called “data model”. Continue reading ‘model vs model’ »

Polish AstroStatistics

I am visiting Copernicus Astronomical Center in Warsaw this week and this is the reason for Polish connection! I learned about two papers that might interest our group. They are authored by Alex Schwarzenberg-Czerny

1. Accuracy of period determination, (1991 MNRAS.253, 198)

Periods of oscillation are frequently found using one of two methods: least-squares (LSQ) fit or power spectrum. Their errors are estimated using the LSQ correlation matrix or the Rayleigh resolution criterion, respectively. In this paper, it is demonstrated that both estimates are statistically incorrect. On the one hand, the LSQ covariance matrix does not account for correlation of residuals from the fit. Neglect of the correlations may cause large underestimation of the variance. On the other hand, the Rayleigh resolution criterion is insensitive to signal-to-noise ratio and thus does not reflect quality of observations. The correct variance estimates are derived for the two methods.
Continue reading ‘Polish AstroStatistics’ »

Provocative Corollary to Andrew Gelman’s Folk Theorem

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.