#### [SPS] Testing Completeness

There will be a special session at the 213th AAS meeting on meaning from surveys and population studies (SPS). Until then, it might be useful to pull out some interesting and relevant papers and questions/challenges as a preliminary to the meeting. I will not list astronomical catalogs and surveys only, which are literally countless these days but will bring out some if they change the way how science is performed with a description of the catalog (the best example would be SDSS, Sloan Digital Sky Survey, to my knowledge).

The main focus of the series of postings (I’m not sure how many there will be. There are chances that [SPS] series might be terminated after this season) is introducing some statistical challenges including managing data, opt to be spawn from astronomical surveys and population studies. My paper selection criterion is based on the group discussions from the SPS working group during SAMSI astrostatistics program in 2006 (group leaders were G. Babu, Director of CASt and T. Loredo).

Completeness – I. Revised, reviewed and revived by Johnston, Teodoro, and Hendry
MNRAS, 376(4), pp. 1757-176
Abstract (abridged to the first paragraph) We have extended and improved the statistical test recently developed by Rauzy for assessing the completeness in apparent magnitude of magnitude-redshift surveys. Our improved test statistic retains the robust properties – specifically independence of the spatial distribution of galaxies within a survey – of the Tc statistic introduced in Rauzy’s seminal paper, but now accounts for the presence of both a faint and bright apparent magnitude limit. We demonstrate that a failure to include a bright magnitude limit can significantly affect the performance of Rauzy’s Tc statistic. Moreover, we have also introduced a new test statistic, Tv, defined in terms of the cumulative distance distribution of galaxies within a redshift survey. These test statistics represent powerful tools for identifying and characterizing systematic errors in magnitude-redshift data.

One of the authors was an active participant of the SPS working group at SAMSI. The following three quotes pertain statistically genuine content-wise although the paper was published in MNRAS.

It is straightforward to show from this definition that the random variable η has a uniform distribution on the interval [0,1], and furthermore that η and Z are statistically independent.

If the sample is complete in apparent magnitude, for a given pair of trial magnitude limits, then Tc should be normally distributed with mean zero and variance unity. If, on the other hand, the trial faint (bright) magnitude limit is fainter (brighter) than the true limit, Tc will become systematically negative, due to the systematic departure of the $$\hat{\eta}_i$$ distribution from uniform on the interval [0,1].

If the sample is complete in apparent magnitude, for a given pair of trail magnitude limits, then Tv should be normally distributed with mean zero, and variance unity. If, on the other hand, the trail faint (bright)magnitude limit is fainter (brighter) than the true limit, in either case Tv will become systematically negative, due to the systematic departure of the $$\hat{\tau}_i$$ distribution from uniform on the interval [0,1].

Their statistics is utilized as a diagnostic tool such that the estimate of statistics becomes an indicator of completeness at a given magnitude. Otherwise, asymptotic studies could have been exercised in depth so that people who use their statistics (Tc and Tv) could obtain p-values (for hypothesis testing) and confidence intervals. The authors, however, computed the means and variances and stated that these statistics are standard normal without no rigorous proofs. On the other hand, the process of estimating Tc and Tv statistics is nonparametric so that further statistical inference such as showing that asymptotically Tc and Tv are normal, can be very challenging unless strong assumptions on (probabilistic) models and/or priors are given. Overall, these statistics are more statistically appealing to me in terms of testing completeness compared to other ratio based methods.

Testing completeness now seems not a difficult task due to these statistics, extensive survey catalogs, and better understanding of populations. However, still uncertainties in k-correction, e-correction, and extinction correction make their statistics fuzzy and difficult to interpret results. Changes in statistics due to these uncertainties are hard to be characterized. Furthermore, obtaining good (point) estimators for these correction terms still remains as almost unconquered.

In addition to testing completeness described in the above paper, regarding incompleteness, I’ve seen modeling efforts basically based on the power law, whose slope parameter is an indicator of cosmological models from x-ray astronomy. Unfortunately, incompleteness makes the slope estimation process complex and lots of efforts are found in searching/estimating a model reflecting this incompleteness in observations as a function of redshifts or magnitudes; otherwise, it is fitting a simple ordinary linear regression model with a complete data set.

I believe someday incompleteness will be stochastically modeled (parameterized to draw information and to offer good prediction) beyond testing and will offer better understanding of the visible universe (visible here is a very broad concept, not indicating something only can be seen through naked human eyes). For a while, (in)completeness has been a concept and a word of meaning to which mathematical compactness and statistical modeling has never been attached to test and to understand uncertainties.

p.s. I have been paying lots of attention on citation style; in contrast, you’ve noticed my citations are far from consistency. Two noticeable differences between citation styles of statistics and astronomy are abbreviation of journal names and inclusion of titles. Astronomers’ citation is compact, concise, and same across astronomical journals; on the contrary, statisticians’ citation is lengthy, informative (because of title), and various across statistical and applied statistics journals. MNRAS reminded me something that from a paper written by a very renowned statistician referred a paper from MNRAS but said Monograph National Royal Astronomical Society. I think now you become gracious to my citation style.

[disclaimer] I saw various population studies in astronomy from a broad wavelength range, each of which has different objectives, targets, obstacles, and study designs (even telescopes, detectors, data pipelines, and sampling schemes are different), and (in)completeness studies are designed to reflect those differences. I’m afraid that I’m only reporting a tiny fraction of all efforts related to (in)completeness. Your comments are most welcome. Also, I wish for your posts and comments regarding (in)completeness, volume/magnitude limited sample, survey studies, upper limits, missing values in survey, clustering, spatial distribution, large scale structure, etc in the near future.