Jul 12th, 2009| 07:21 pm | Posted by hlee

Approximately for a decade, there have been journals dedicated to **bioinformatics.** On the other hand, there is none in astronomy although astronomers have a long history of comprising a huge volume of catalogs and data archives. Prof. Bickel's comment during his plenary lecture at the IMS-APRM particularly on **sparse matrix** and **philosophical issues on choosing principal components** led me to wonder why astronomers do not discuss **astroinformatics**.

Jan 20th, 2009| 01:59 pm | Posted by hlee

Someone emailed me for globular cluster data sets I used in a proceeding paper, which was about how to determine the multi-modality (multiple populations) based on well known and new information criteria without binning the luminosity functions. I spent quite time to understand the data sets with suspicious numbers of globular cluster populations. On the other hand, obtaining globular cluster data sets was easy because of available data archives such as VizieR. Most data sets in charts/tables, I acquire those data from VizieR. In order to understand science behind those data sets, I check ADS. Well, actually it happens the other way around: check scientific background first to assess whether there is room for statistics, then search for available data sets.

Oct 27th, 2008| 09:24 am | Posted by hlee

The notions of **missing data** are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I'm not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about **imputation** in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about **incompleteness** within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes.

Oct 1st, 2008| 04:16 pm | Posted by hlee

People of experience would say very differently and wisely against what I'm going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics.

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse **classification and clustering** and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a **black box**. I mean a black box as in neural network, which is one of classification algorithms.

Sep 16th, 2008| 03:20 pm | Posted by hlee

A nice book by **Christopher Bishop.**

While I was reading abstracts and papers from astro-ph, I saw many applications of algorithms from **pattern recognition and machine learning (PRML).** The frequency will increase as large scale survey projects numerate, where recommending a good textbook or a reference in the field seems timely. Continue reading ‘[Book] pattern recognition and machine learning’ »

Jun 8th, 2008| 09:45 pm | Posted by hlee

Despite no statistic related discussion, a paper comparing XSPEC and ISIS, spectral analysis open source applications might bring high energy astrophysicists' interests this week.

Apr 21st, 2008| 11:56 pm | Posted by hlee

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh

**Estimating Third-Order Moments for an Absorber Catalog**

Continue reading ‘[ArXiv] Ripley’s K-function’ »

Jan 11th, 2008| 03:44 pm | Posted by hlee

It is notable that there’s an astronomy paper contains **AIC, BIC**, and **Bayesian evidence** in the title. The topic of the paper, unexceptionally, is cosmology like other astronomy papers discussed these (statistical) information criteria (I only found a couple of papers on model selection applied to astronomical data analysis without articulating CMB stuffs. Note that I exclude Bayes factor for the model selection purpose).

To find the paper or other interesting ones, click

Sep 19th, 2007| 02:21 pm | Posted by vlk

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, "false sources" are generally in the domain of ~~Type II~~ **Type I** errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real?

Jul 25th, 2007| 01:46 pm | Posted by hlee

From arxiv/astro-ph:0707.3413

**The Sixth Data Release of the Sloan Digital Sky Survey** by … many people …

The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at http://www.sdss.org/dr6. Additionally, Catalog Archive Service (CAS) and

SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.

[ArXiv] SDSS DR6, July 23, 2007

Jul 16th, 2007| 12:15 pm | Posted by hlee

From arxiv/astro-ph:0707.1900v1

** The complete catalogue of gamma-ray bursts observed by the Wide Field Cameras on board BeppoSAX ** by Vetere, et.al.

This paper intend to publicize the largest data set of Gamma Ray Burst (GRB) X-ray afterglows (right curves after the event), which is available from http://www.asdc.asi.it. It is claimed to be a complete on-line catalog of GRB observed by two wide-Field Cameras on board BeppoSAX (Click for its Wiki) in the period of 1996-2002. It is comprised with 77 bursts and 56 GRBs with Xray light curves, covering the energy range 40-700keV. A brief introduction to the instrument, data reduction, and catalog description is given.

