survey and design of experiments

People of experience would say very differently and wisely against what I’m going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics.

When it comes to survey, the first thing comes in my mind is the census packet although I only saw it once (an easy way to disguise my age but this is true) but the questionaire layouts are so carefully and extensively done so as to give me a strong impression. Such survey is designed prior to collecting data so that after collection, data can be analyzed according to statistical methodology suitable to the design of the survey. Strategies for response quantification is also included (yes/no for 0/1, responses in 0 to 10 scale, bracketing salaries, age groups, and such, handling missing data) for elaborated statistical analysis to avoid subjective data transformation and arbitrary outlier eliminations.

In contrast, survey in astronomy means designing a mesh, not questionaires, unable to be transcribed into statistical models. This mesh has multiple layers like telescope, detector, and source detection algorithm, and eventually produces a catalog. Designing statistical methodology is not a part of it that draws interpretable conclusion. Collecting what goes through that mesh is astronomical survey. Analyzing the catalog does not necessarily involve sophisticated statistics but often times adopts chi-sq fittings and cast aways of unpleasant/uninteresting data points.

As other conflicts in jargon, –a simplest example is Ho: I used to know it as Hubble constant but now, it is recognized first as a notation for a null hypothesissurvey has been one of them and like the measurement error, some clarification about the term, survey is expected to be given by knowledgeable astrostatisticians to draw more statisticians involvement in grand survey projects soon to come. Luckily, the first opportunity will be given soon at the Special Session: Meaning from Surveys and Population Studies: BYOQ during the 213 AAS meeting, at Long Beach, California on Jan. 5th, 2009.

  1. vlk:

    I would say it is unfair to compare a census with an astro survey. To begin with, a census has very limited objectives, and it is easy to design the questions to gather exactly the kind of data that are needed. And, they have been running it every decade for two centuries, and have had plenty of time to perfect their procedure. Not to mention, the census takers do not have to worry about the Malmquist bias. Even so, consider a typical astro survey nowadays, say the SDSS — you get number counts of stars, fluxes in five passbands, measurement errors on each of them, and a host of well-defined secondary characteristics such as source position, extent, variability, data quality, etc.

    So, let me pose this question instead — what is missing? What should be done in an Astro survey that is not being done now?

    10-01-2008, 8:22 pm
  2. hlee:

    I was trying to expose a subtle difference purely based on my listening a few talks about LSST. They emphasize PB/day (?) data collection. Somewhat it’ll be processed to the level computer memory can handle but no one mentioned how this massive data will be handled. I never got how much degree of reduction through source detection or compression. Furthermore, since data are collected through years on daily bases, some sort of streaming data analysis schemes must be laid out but no one mentioned it. I’m not saying there is missing in Astro survey but there is a difference in surveys. Statisticians are more interested in how to analyze data and survey reflect this objective and astronomers are interested in what to extract for science. How to comes after collecting data but generally use chi sq after data reduction (throwing outliers – even after one throws data points away, still one has thousands of data points and chi square does work).

    10-02-2008, 2:25 pm
  3. vlk:

    You are right, astronomy surveys are probably more concerned with measurement than analysis. The analysis, other than the primary goal for which the survey would have been set up, is usually left as an exercise for grad students downstream.

    There are also “accidental surveys”, which are not planned at all beforehand, but simply make use of a large body of existing observations to serendipitously catalog what is out there. Some examples are the Einstein Slew Survey, the Chandra Multiwavelength Project, X-Atlas, etc.

    10-03-2008, 2:46 pm
