The AstroStat Slog » spectrum

Redistribution

vlk — Sat, 01 Nov 2008 16:41:48 +0000

RMF. It is a wørd to strike terror even into the hearts of the intrepid. It refers to the spread in the measured energy of an incoming photon, and even astronomers often stumble over what it is and what it contains. It essentially sets down the measurement error for registering the energy of a photon in the given instrument.

Thankfully, its usage is robustly built into analysis software such as Sherpa or XSPEC and most people don’t have to deal with the nitty gritty on a daily basis. But given the profusion of statistical software being written for astronomers, it is perhaps useful to go over what it means.

The Redistribution Matrix File (RMF) is, at its most basic, a description of how the detector responds to incoming photons. It describes the transformation from the photons that are impinging on the detector to the counts that are recorded by the instrument electronics. Ideally, one would want there to be a one-to-one mapping between the photon’s incoming energy and the recorded energy, but in the real world detectors are not ideal. The process of measuring the energy introduces a measurement error, which is encoded as the probability that incoming photons at energy E are read out in detector channels i. Thus, for each energy E, there results an array of probabilities p(i|E) such that the observed counts in channel i,
$$c_i|d_E \sim {\rm Poisson}(p(i|E) \cdot d_E) \,,$$
where d_E is the expected counts at energy E, and is the product of the source flux at the telescope and the effective area of the telescope+detector combination. Equivalently, the expected counts in channel i,
$${\rm E}(c_i|d_E) = p(i|E) \cdot d_E \,.$$

The full format of how the arrays p(i|E) are stored in files is described in a HEASARC memo, CAL/GEN/92-002a. Briefly, it is a FITS file with two tables, of which only the first one really matters. This first table (“SPECRESP MATRIX”) contains the energy grid boundaries {E_j; j=1..N_E} where each entry j corresponds to one set of p(i|E_j). The arrays themselves are stored in compressed form, as the smallest possible array that excludes all the zeros. An ideal detector, where $$p(i|E_j) \equiv \delta_{ij}$$ would be compressed to a matrix of size N_E × 1. The FITS extension also contains additional arrays to help uncompress the matrix, such as the index of the first non-zero element and the number of non-zero elements for each p(i|E_j).

The second extension (“EBOUNDS”) contains an energy grid {e_i; i=1..N_channels} that maps to the channels i. This grid is fake! Do not use it for anything except display purposes or for convenient shorthand! What it is is a mapping of the average detector gain to the true energy, such that it lists the most likely energy of the photons registered in that bin. This grid allows astronomers to specify filters to the spectrum in convenient units that are semi-invariant across instruments (such as [Å] or [keV]) rather than detector channel numbers, which are unique to each instrument. But keep in mind, this is a convenient fiction, and should never be taken seriously. It is useful when the width of p(i|E) spans only a few channels, and completely useless for lower-resolution detectors.

[ArXiv] 5th week, Apr. 2008

hlee — Mon, 05 May 2008 07:08:42 +0000

Since I learned Hubble’s tuning fork^[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was.

I’m glad to see this week presented a paper that I had dreamed of many years ago in addition to other interesting papers. Nowadays, I’m more and more realizing that astronomical machine learning is not simple as what we see from machine learning and statistical computation literature, which typically adopted data sets from the data repository whose characteristics are well known over the many years (for example, the famous iris data; there are toy data sets and mock catalogs, no shortage of data sets of public characteristics). As the long list of authors indicates, machine learning on astronomical massive data sets are never meant to be a little girl’s dream. With a bit of my sentiment, I offer the list of this week:

[astro-ph:0804.4068] S. Pires et al.
FASTLens (FAst STatistics for weak Lensing) : Fast method for Weak Lensing Statistics and map making
[astro-ph:0804.4142] M.Kowalski et al.
Improved Cosmological Constraints from New, Old and Combined Supernova Datasets
[astro-ph:0804.4219] M. Bazarghan and R. Gupta
Automated Classification of Sloan Digital Sky Survey (SDSS) Stellar Spectra using Artificial Neural Networks
[gr-qc:0804.4144]E. L. Robinson, J. D. Romano, A. Vecchio
Search for a stochastic gravitational-wave signal in the second round of the Mock LISA Data challenges
[astro-ph:0804.4483]C. Lintott et al.
Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
[astro-ph:0804.4692] M. J. Martinez Gonzalez et al.
PCA detection and denoising of Zeeman signatures in stellar polarised spectra
[astro-ph:0805.0101] J. Ireland et al.
Multiresolution analysis of active region magnetic structure and its correlation with the Mt. Wilson classification and flaring activity

A relevant post related machine learning on galaxy morphology from the slog is found at svm and galaxy morphological classification

< Added: 3rd week May 2008>[astro-ph:0805.2612] S. P. Bamford et al.
Galaxy Zoo: the independence of morphology and colour

Wikipedia link: Hubble sequence

Photometric Redshifts

hlee — Wed, 25 Jul 2007 06:28:40 +0000

Since I began to subscribe arxiv/astro-ph abstracts, from an astrostatistical point of view, one of the most frequent topics has been photometric redshifts. This photometric redshift has been a popular topic as the catalog of remote photometric object observation multiplies its volume and sky survey projects in multiple bands lead to virtual observatories (VO – will discuss in the later posting). Just searching by photometric redshifts in google scholar and arxiv.org provides more than 2000 articles since 2000.

Quantifying redshifts is one of the key astronomical measures to identify the type of objects as well as to provide their distance. Typically, measuring redshifts requires spectral data, which are quite expensive in many aspects compared to photometric data. Let me explain a little what are spectral data and photometric data to enhance understandings for non astronomers.

Collecting photometric data starts from taking pictures with different filters. Through blue, yellow, red optical filters, or infrared, ultra-violet, X-ray filters, objects look different (or have different light intensity) and various astronomical objects can be identify via investigating pictures of many filter combinations. On the other hand, collecting spectral data starts from dispersing light through a specially designed prism. Because of this light dispersion, it takes longer to collect lights from a object and the smaller number of objects are recorded in a picture plate compared to collecting photometric data. A nice feature of this expensive spectral data is providing the physical condition of the object directly: first, the distance by the relative spectral line shifts of spectral lines; second, abundance (the metallic composition of the object), temperature, type of the object also from spectral lines. Therefore, utilizing photometric data to infer measures normally available from spectral data is a very attractive topic in astronomy.

However, there are many challenges. The massive volume of data and sampling bias*, like Malmquist bias (wiki) and Lutz-Kelker bias, hinder traditional regression techniques, where numerous statistical and machine learning methods have been introduced to make most of these photometric data to infer distances economically and quickly.

*((For a reference regarding these biases and astronomical distances, please check Distance Estimation in Cosmology by
Hendry, M. A. and Simmons, J. F. L., Vistas in Astronomy, vol. 39, Issue 3, pp.297-314.))