Approximately for a decade, there have been journals dedicated to bioinformatics. On the other hand, there is none in astronomy although astronomers have a long history of comprising a huge volume of catalogs and data archives. Prof. Bickel’s comment during his plenary lecture at the IMS-APRM particularly on sparse matrix and philosophical issues on choosing principal components led me to wonder why astronomers do not discuss astroinformatics.

Nevertheless, I’ve noticed a few astronomers rigorously apply principle component analysis (PCA) in order to reduce the dimensionality of a data set. An evident example of PCA applications in astronomy is photo-z. In contrast to the wide PCA application, almost no publication about statistical adequacy studies is found by investigating the properties of covariance matrix and its estimation method particularly when it is sparse. Even worse, the notion of measurement errors are improperly implemented since statistician’s dimension reduction methodology never confronted astronomers’ measurement errors. How to choose components is seldom discussed since the significance in physics model is rarely agreeing with statistical significance. This disagreement often elongates scientific writings hard to please readers. As a compromise, statistical parts are omitted, which makes me feel the publication incomplete.

Due to its easy visualization via intuitive scales, in wavelet multiscale imaging, the coarse scale to fine scale approach and the assumption of independent noise enables to clean the noisy image and to accentuate features in it. Likewise, principle components and other dimension reduction methods in statistics capture certain features via transformed metrics and regularized, or penalized objective functions. These features are not necessary to match the important features in astrophysics unless the likelihood function and selected priors match physics models. To my knowledge, astronomical literature exploiting PCA for dimension reduction for prediction rarely explains why PCA is chosen for dimensionality reduction, how to compensate the sparsity in covariance matrix, and other questions, often the major topics in bioinformatics. In the literature, these questions are explored to explain the particular selection of gene attributes or bio-markers under a certain response like blood pressures and types of cancers. Instead of binning and chi-square minimization, statisticians explore strategies how to compensate sparsity in the data set to get unbiased best fits and righteous error bars based on data matching assumptions and theory.

Luckily, there are efforts among some renown astronomers to form a community of astroinformatics. At the dawn of bioinformatics, genetic scientists were responsible for the bio part and statisticians were responsible for the informatics until young scientists are educated enough to carry out bioinformatics by themselves. Observing this trend partially from statistics conferences created an urge in me that it is my responsibility to ponder why there has been shortage of statisticians’ involvement in astronomy regardless of plethora of catalogs and data archives with long history. A few postings will follow what I felt while working among astronomers. I hope this small bridging effort to narrow the gap between two communities. My personal wish is to see prospering astroinformatics like bioinformatics.

One Comment
  1. Doug Burke:

    There are a number of us trying to push the AstroInformatics band wagon out into the spotlight (to mix up a few metaphors).

    For more information have a look at

    There is going to be an AAS session on AstroInformatics at the 2010 Winter AAS in Washington, DC and we plan to “re-brand” the Practical Semantic Astronomy workshop series to Practical AstroInformatics.

    08-10-2009, 4:03 pm
Leave a comment