The AstroStat Slog » correlation function http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 [ArXiv] 5th week, Apr. 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/#comments Mon, 05 May 2008 07:08:42 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=281 Since I learned Hubble’s tuning fork[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was.

I’m glad to see this week presented a paper that I had dreamed of many years ago in addition to other interesting papers. Nowadays, I’m more and more realizing that astronomical machine learning is not simple as what we see from machine learning and statistical computation literature, which typically adopted data sets from the data repository whose characteristics are well known over the many years (for example, the famous iris data; there are toy data sets and mock catalogs, no shortage of data sets of public characteristics). As the long list of authors indicates, machine learning on astronomical massive data sets are never meant to be a little girl’s dream. With a bit of my sentiment, I offer the list of this week:

  • [astro-ph:0804.4068] S. Pires et al.
    FASTLens (FAst STatistics for weak Lensing) : Fast method for Weak Lensing Statistics and map making
  • [astro-ph:0804.4142] M.Kowalski et al.
    Improved Cosmological Constraints from New, Old and Combined Supernova Datasets
  • [astro-ph:0804.4219] M. Bazarghan and R. Gupta
    Automated Classification of Sloan Digital Sky Survey (SDSS) Stellar Spectra using Artificial Neural Networks
  • [gr-qc:0804.4144]E. L. Robinson, J. D. Romano, A. Vecchio
    Search for a stochastic gravitational-wave signal in the second round of the Mock LISA Data challenges
  • [astro-ph:0804.4483]C. Lintott et al.
    Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
  • [astro-ph:0804.4692] M. J. Martinez Gonzalez et al.
    PCA detection and denoising of Zeeman signatures in stellar polarised spectra
  • [astro-ph:0805.0101] J. Ireland et al.
    Multiresolution analysis of active region magnetic structure and its correlation with the Mt. Wilson classification and flaring activity

A relevant post related machine learning on galaxy morphology from the slog is found at svm and galaxy morphological classification

< Added: 3rd week May 2008>[astro-ph:0805.2612] S. P. Bamford et al.
Galaxy Zoo: the independence of morphology and colour

  1. Wikipedia link: Hubble sequence
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/feed/ 0
[ArXiv] Ripley’s K-function http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-ripleys-k-function/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-ripleys-k-function/#comments Tue, 22 Apr 2008 03:56:33 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=277 Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog

Instead of getting to the detailed contents, which is left to the readers, I’d rather cite a few key points without math symbols.The script K is denoted as the 3rd order K-function from which the three-point and reduced three-point correlation functions are derived. The benefits of using the script K function over these correlation functions are given regarding bin size and edge correction. Yet, the author did not encourage to use the script K function only but to use all tools. Also, the feasibility of computing third or higher order measures of clustering is mentioned due to larger datasets and advances in computing. In appendix, the unbiasedness of the estimator regarding the script K is proved.

The reason for bringing in this K-function comes from my early experience in learning statistics. My memory of learning the 2 point correlation function from an undergraduate cosmology class is very vague but the basic idea of modeling this function gave me an epiphany during a spatial statistics class several years ago when the Ripley’s K-function was introduced. I vividly remember that I set up my own project to use this K-function to get the characteristics of the spatial distribution of GRBs. The particular reason for selecting GRBs instead of galaxies was 1. I was able to find the data set from the internet on my own (BATSE catalog: astronomers may think accessing data archives is easy but generally statistics students were not exposed to the fact that astronomical data sets are available via internet and in terms of data sets, they depend heavily on data providers, or clients), and 2. I recalled a paper by Professors Efron and Petrosian (1995, ApJ, 449:215-223 Testing Isotropy versus Clustering of Gamma-ray Bursts, who utilized the nearest neighborhood approach. After a few weeks, I made another discovery that people found GRB redshifts and began to understand the cosmological origin of GRBs more deeply. In other words, 2D spatial statistics was not the way to find the origins of GRBs. Due to a few shortcomings, one of them was the latitude dependent observation of BATSE (as a second year graduate student, I didn’t confront the idea of censoring and truncation, yet), I discontinued my personal project with a discouragement that I cannot make any contribution (data themselves, like discovering the distances, speak more louder than statistical inferences without distances).

I was delighted to see the work by Prof. Loh about the Ripley’s K function. Those curious about the K function may check the book written by Martinez and Saar, Statistics of the Galaxy Distribution (Amazon Link). Many statistical publications are also available under spatial statistics and point process that includes the Ripley’s K function.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-ripleys-k-function/feed/ 0