Archive for the ‘Data Processing’ Category.

Reduced and Processed Data

Hyunsook recently said that she wished that there were “some astronomical data depositories where no data reduction is required but one can apply various statistical analyses to the data in the depository to learn and compare statistical methods”. With the caveat that there really is no such thing (every dataset will require case specific reduction; standard processing and reduction are inadequate in all but the simplest of cases), here is a brief list: Continue reading ‘Reduced and Processed Data’ »

[ArXiv] 1st week, June 2008

Despite no statistic related discussion, a paper comparing XSPEC and ISIS, spectral analysis open source applications might bring high energy astrophysicists’ interests this week. Continue reading ‘[ArXiv] 1st week, June 2008’ »

Did they, or didn’t they?

Earlier this year, Peter Edmonds showed me a press release that the Chandra folks were, at the time, considering putting out describing the possible identification of a Type Ia Supernova progenitor. What appeared to be an accreting white dwarf binary system could be discerned in 4-year old observations, coincident with the location of a supernova that went off in November 2007 (SN2007on). An amazing discovery, but there is a hitch.

And it is a statistical hitch, and involves two otherwise highly reliable and oft used methods giving contradictory answers at nearly the same significance level! Does this mean that the chances are actually 50-50? Really, we need a bona fide statistician to take a look and point out the errors of our ways.. Continue reading ‘Did they, or didn’t they?’ »

Astrometry.net

Astrometry.net, a cool website I heard from Harvard Astronomy Professor Doug Finkbeiner’s class (Principles of Astronomical Measurements), does a complex job of matching your images of unknown locations or coordinates to sources in catalogs. By providing your images in various formats, they provide astrometric calibration meta-data and lists of known objects falling inside the field of view. Continue reading ‘Astrometry.net’ »

[ArXiv] A fast Bayesian object detection

This is a quite long paper that I separated from [Arvix] 4th week, Feb. 2008:
      [astro-ph:0802.3916] P. Carvalho, G. Rocha, & M.P.Hobso
      A fast Bayesian approach to discrete object detection in astronomical datasets – PowellSnakes I
As the title suggests, it describes Bayesian source detection and provides me a chance to learn the foundation of source detection in astronomy. Continue reading ‘[ArXiv] A fast Bayesian object detection’ »

The GREAT08 Challenge

Grand statistical challenges seem to be all the rage nowadays. Following on the heels of the Banff Challenge (which dealt with figuring out how to set the bounds for the signal intensity that would result from the Higgs boson) comes the GREAT08 Challenge (arxiv/0802.1214) to deal with one of the major issues in observational Cosmology, the effect of dark matter. As Douglas Applegate puts it: Continue reading ‘The GREAT08 Challenge’ »

Books – a boring title

I have been observing some sorts of misconception about statistics and statistical nomenclature evolution in astronomy, which I believe, are attributed to the lack of references in the astronomical society. There are some textbooks designed for junior/senior science and engineering students, which are likely unknown to astronomers. Example-wise, these books are not suitable, to my knowledge. Although I never expect astronomers to learn standard graduate (mathematical) statistics textbooks, I do wish astronomers go beyond Numerical Recipes (W. H. Press, S. A. Teukolsky, W. T. Vetterling, & B. P. Flannery) and Error Data Reduction and Analysis for the Physical Sciences (P. R. Bevington & D. K. Robinson). Here are some good ones written by astronomers, engineers, and statisticians: Continue reading ‘Books – a boring title’ »

Dance of the Errors

One of the big problems that has come up in recent years is in how to represent the uncertainty in certain estimates. Astronomers usually present errors as +-stddev on the quantities of interest, but that presupposes that the errors are uncorrelated. But suppose you are estimating a multi-dimensional set of parameters that may have large correlations amongst themselves? One such case is that of Differential Emission Measures (DEM), where the “quantity of emission” from a plasma (loosely, how much stuff there is available to emit — it is the product of the volume and the densities of electrons and H) is estimated for different temperatures. See the plots at the PoA DEM tutorial for examples of how we are currently trying to visualize the error bars. Another example is the correlated systematic uncertainties in effective areas (Drake et al., 2005, Chandra Cal Workshop). This is not dissimilar to the problem of determining the significance of a “feature” in an image (Connors, A. & van Dyk, D.A., 2007, SCMA IV). Continue reading ‘Dance of the Errors’ »

[ArXiv] Spatial Correlation in the Scan Statistic

Accounting for Spatial Correlation in the Scan Statistic by Loh &Zhu [stat.AP:0712.1458] provides a picture that helps us to understand excessive false alarms in source detection when the image data set is based on the Poisson point process. Without any experience in source detection analysis, empirically I cannot state the detection statistics nor the p-values of detection methods. However, with acknowledging the over-dispersed Poisson count data and the unknown spatial correlation prior to detecting analysis, we could guess that the false discovery of sources occurs more often than what we expect.
Continue reading ‘[ArXiv] Spatial Correlation in the Scan Statistic’ »

compressed sensing and a blog

My friend’s blog led me to Terrence Tao’s blog. A mathematician writes topics of applied mathematics and others. A glance tells me that all postings are well written. Especially, compressed sensing and single pixel cameras drags my attention more because the topic stimulates thoughts of astronomers in virtual observatory[1] and image processing[2] (it is not an exaggeration that observational astronomy starts with taking pictures in a broad sense) and statisticians in multidimensional applications, not to mention engineers in signal and image processing. Continue reading ‘compressed sensing and a blog’ »

  1. see the slog posting “Virtual Observatory”[]
  2. see the slog posting “The power of wavedetect”[]

Implement Bayesian inference using PHP

Not knowing much about java and java applets in a software development and its web/internet publicizing, I cannot comment what is more efficient. Nevertheless, I thought that PHP would do the similar job in a simpler fashion and the followings may provide some ideas and solutions for publicizing statistical methods through websites based on Bayesian Inference.
Continue reading ‘Implement Bayesian inference using PHP’ »

model vs model

As Alanna pointed out, astronomers and statisticians mean different things when they say “model”. To complicate matters, we have also started to use another term called “data model”. Continue reading ‘model vs model’ »

Provocative Corollary to Andrew Gelman’s Folk Theorem

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.

Quote of the Week, October 3, 2007

From the ever-quotable Andrew Gelman comes this gem, which he calls a Folk Theorem :

When things are hard to compute, often the model doesn’t fit the data. Difficulties in computation are therefore often model problems… [When the computation isn't working] we have the duty and freedom to think about models.

Continue reading ‘Quote of the Week, October 3, 2007’ »

When you observed zero counts, you didn’t not observe any counts

Dong-Woo, who has been playing with BEHR, noticed that the confidence bounds quoted on the source intensities seem to be unchanged when the source counts are zero, regardless of what the background counts are set to. That is, p(s|NS,NB) is invariant when NS=0, for any value of NB. This seems a bit odd, because [naively] one expects that as NB increases, it should/ought to get more and more likely that s gets closer to 0. Continue reading ‘When you observed zero counts, you didn’t not observe any counts’ »