[ArXiv] Spatial Correlation in the Scan Statistic

Accounting for Spatial Correlation in the Scan Statistic by Loh &Zhu [stat.AP:0712.1458] provides a picture that helps us to understand excessive false alarms in source detection when the image data set is based on the Poisson point process. Without any experience in source detection analysis, empirically I cannot state the detection statistics nor the p-values of detection methods. However, with acknowledging the over-dispersed Poisson count data and the unknown spatial correlation prior to detecting analysis, we could guess that the false discovery of sources occurs more often than what we expect.

To incorporate overdispersion and spatial correlation in detecting clusters from the spatial Poisson point process, the authors proposed the null model that includes spatial correlation, which is referred as the spatial generalized linear mixed model (SGLMM), or generalized spatial linear model (GSLM). They show theoretically this over-dispersion and spatial correlation leads to the increase of false positives, which is verified through a simulation study as well. Their simulation study also illustrates that their modified scan statistic substantially reduces the false alarm.

Due to the complexity of the models and theories, the authors rely on simulation study heavily. Nonetheless, the figures that illustrates the posterior distributions of the scan statistic are difficult to read (due to the low resolution scanned images). Please, keep this in your mind while you are reading it.

    Late 70′s and early 80′s, Professor Ripley (I think this Ripley is the one who authored well known books about S/Splus for statistical data analysis in 90′s and early 00′s) published papers about spatial statistics applied to astronomical image data. My impression years back tells me the data quality was not satisfactory to draw statistical conclusions. Since then, although a few groups have worked spatial statistics to astronomical data sporadically, the application of spatial statistics digressed toward epidemiology and geology data sets. Personally, I believe now is the perfect time to draw spatial statisticians’ attention back. There’s plethora of data because of many survey projects, particularly SDSS and upcoming LSST. Only astronomical data challenge spatial statistics with 3D data, with the uncertainty in distance, and with sampling bias due to truncation for revolutionary statistical modeling.

