Archive for the ‘Imaging’ Category.

A History of Markov Chain Monte Carlo

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Background Subtraction, the Sequel [Eqn]

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data. Continue reading ‘Background Subtraction, the Sequel [Eqn]’ »

Open and Shut [Equation of the Week]

For a discipline that relies so heavily on images, it is rather surprising how little use astronomy makes of the vast body of work on image analysis carried out by mathematicians and computer scientists. Mathematical morphology, for example, can be extremely useful in enhancing, recognizing, and extracting useful information from densely packed astronomical

The building blocks of mathematical morphology are two operators, Erode[I|Y] and Dilate[I|Y], Continue reading ‘Open and Shut [Equation of the Week]’ »

Mexican Hat [EotW]

The most widely used tool for detecting sources in X-ray images, especially Chandra data, is the wavelet-based wavdetect, which uses the Mexican Hat (MH) wavelet. Now, the MH is not a very popular choice among wavelet aficianados because it does not form an orthonormal basis set (i.e., scale information is not well separated), and does not have compact support (i.e., the function extends to inifinity). So why is it used here?
Continue reading ‘Mexican Hat [EotW]’ »

[ArXiv] 5th week, Apr. 2008

Since I learned Hubble’s tuning fork[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was. Continue reading ‘[ArXiv] 5th week, Apr. 2008’ »

  1. Wikipedia link: Hubble sequence[], a cool website I heard from Harvard Astronomy Professor Doug Finkbeiner’s class (Principles of Astronomical Measurements), does a complex job of matching your images of unknown locations or coordinates to sources in catalogs. By providing your images in various formats, they provide astrometric calibration meta-data and lists of known objects falling inside the field of view. Continue reading ‘’ »

The GREAT08 Challenge

Grand statistical challenges seem to be all the rage nowadays. Following on the heels of the Banff Challenge (which dealt with figuring out how to set the bounds for the signal intensity that would result from the Higgs boson) comes the GREAT08 Challenge (arxiv/0802.1214) to deal with one of the major issues in observational Cosmology, the effect of dark matter. As Douglas Applegate puts it: Continue reading ‘The GREAT08 Challenge’ »

The Digital Universe

Another one in the CXC/CfA Visualizing Astronomy series: “The Digital Universe: Cosmic Cartography and Data Visualization”, by Brian Abbott of Hayden Planetarium & Department of Astrophysics, next Tuesday, Nov 13, at 2pm in Phillips. Continue reading ‘The Digital Universe’ »

compressed sensing and a blog

My friend’s blog led me to Terrence Tao’s blog. A mathematician writes topics of applied mathematics and others. A glance tells me that all postings are well written. Especially, compressed sensing and single pixel cameras drags my attention more because the topic stimulates thoughts of astronomers in virtual observatory[1] and image processing[2] (it is not an exaggeration that observational astronomy starts with taking pictures in a broad sense) and statisticians in multidimensional applications, not to mention engineers in signal and image processing. Continue reading ‘compressed sensing and a blog’ »

  1. see the slog posting “Virtual Observatory”[]
  2. see the slog posting “The power of wavedetect”[]

The power of wavdetect

wavdetect is a wavelet-based source detection algorithm that is in wide use in X-ray data analysis, in particular to find sources in Chandra images. It came out of the Chicago “Beta Site” of the AXAF Science Center (what CXC used to be called before launch). Despite the fancy name, and the complicated mathematics and the devilish details, it is really not much more than a generalization of earlier local cell detect, where a local background is estimated around a putative source and the question is asked, is whatever signal that is being seen in this pixel significantly higher than expected? However, unlike previous methods that used a flux measurement as the criterion for detection (e.g., using signal-to-noise ratios as proxy for significance threshold), it tests the hypothesis that the observed signal can be obtained as a fluctuation from the background. Continue reading ‘The power of wavdetect’ »

Provocative Corollary to Andrew Gelman’s Folk Theorem

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.

Spurious Sources

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, “false sources” are generally in the domain of Type II Type I errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real? Continue reading ‘Spurious Sources’ »

Visualizing Astronomy

The CXC Education & Outreach Program at the CfA hosts a series of lectures on Visualizing Astronomy, and the first of this season’s is scheduled for Sep 18 at 1:30pm at Phillips:

Date & Time: Tuesday, September 18, 1:30pm
Location: Phillips Auditorium
Speaker: Alyssa Goodman (Harvard)
Title: Amazing New Tools for Exploring Astronomical Data

Continue reading ‘Visualizing Astronomy’ »

[ArXiv] Google Sky, Sept. 05, 2007

Ah..Sky in Google Earth made an arxiv appearance [arxiv/astro-ph:0709.0752], Sky in Google Earth: The Next Frontier in Astronomical Data Discovery and Visualization by R. Scranton et al.

[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007

From arxiv/astro-ph:0708.4030v1
Deep ACS Imaging in the Globular Cluster NGC 6397: The Cluster Color Magnitude Diagram and Luminosity Function by H.B. Richer

This paper presented an observational study of a globular cluster, named NGC 6397, enhanced and more informative compared to previous observations in a sense that 1) a truncation in the white dwarf cooling sequence occurs at 28 magnitude, 2) the cluster main sequence seems to terminate approximately at the hydrogen-burning limit predicted by two independent stellar evolution models, and 3) luminosity functions (LFs) or mass functions (MFs) are well defined. Nothing statistical, but the idea of defining color magnitude diagrams (CMDs) and LFs described in the paper, will assist developing suitable statistics on CMD and LF fitting problems in addition to the improved measurements (ACS imaging) of stars in NGC 6397.
Continue reading ‘[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007’ »