The AstroStat Slog » proposals http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 The Burden of Reviewers http://hea-www.harvard.edu/AstroStat/slog/2008/the-burden-of-reviewers/ http://hea-www.harvard.edu/AstroStat/slog/2008/the-burden-of-reviewers/#comments Thu, 17 Apr 2008 16:17:39 +0000 chasc http://hea-www.harvard.edu/AstroStat/slog/?p=272 Astronomers write literally thousands of proposals each year to observe their favorite targets with their favorite telescopes. Every proposal must be accompanied by a technical justification, where the proposers demonstrate that their goal is achievable, usually via a simulation. Surprisingly, a large number of these justifications are statistically unsound. Guest Slogger Simon Vaughan describes the problem and shows what you can do to make reviewers happy (and you definitely want to keep reviewers happy).

The feasibility analysis of any observing time proposal is one of its most important sections. For example, the XMM-Newton AO7 Policies & Procedures document says: “A realistic estimate of the observing time is a major selection criterion for the OTAC.”

The central part of the feasibility is usually a justification of the requested exposure time, using an analytical or numerical model of the observing process together with some assumptions about the expected data, to demonstrate the suitability of the expected data to answer the scientific question at hand. Telescope time is expensive and precious, so it is important not to waste it; too much means wasting observing time that could be spend on other science, and too little means potentially wasting the entire observation if the science goals cannot be achieved.

Unfortunately, the procedure that is standard in X-ray astronomy is missing one important piece of information; a proposal will usually feature a calculation of `significance’ of the desired result expected for a typical observation, but no figure will be given for the chance that the result might go missed. (The `significance’ itself is a random variable, for many proposals a large proportion of random realizations of the observation may yield insignificant results.) This is compounded by the fact that many proposals make use of just one random simulation in the calculation – one simulation that gives a satisfactory ‘result’. What is important is the distribution of expected data. What proportion show a ‘significant’ result?

Surely a proposal that has a ‘failure probability’ of 0.8 is clearly a more risky proposition than one with a failure probability <0.1. Yet it would not be difficult to produce a single, convincing looking simulation in either case. I do not propose any particular value as a benchmark for the failure probability — the selection panel must make that judgement based on the perceived worth of the science goals of the proposal. The point is that the ‘failure probability’ it is a piece of information that should be made available to proposal panels in order for them to make reasonable judgements about the best use of telescope time.

Broadly speaking there are three types of science goals for telescope proposals: (1) testing hypotheses; (2) estimating parameters of a specific model; (3) exploratory observations with no specific model to test (so-called `fishing expeditions’). The above argument is most clear for the first type, but is equally valid for the second type. A proposal to estimate some model parameter(s) will usually justify the requested exposure time on the basis that it will provide a confidence (or credible) region for the parameter that is sufficiently small for the estimate to be scientifically useful. In practice most proposal writers use either the expected values of the confidence limits, or the values obtained from a single simulation. In either case there may be a high, and unspecified, probability that the observation will in fact produce a confidence interval larger than requested (in which case the observation again does not meet the requirements of the proposal).

The solution to this problem is to include, as standard, a ‘power’ calculation in the feasibility section of proposals. A power calculation for an X-ray telescope proposal would be relatively straightforward. Instead of simulating one dataset at a given exposure time, one would generate a larger number of simulations and ensure that the fraction of non-detections (or too wide confidence intervals) was sufficiently small. Of course, in order to perform such a calculation one must have a well-defined hypotheses to test (e.g. a new source of given brightness, or a spectral line at a particular location and strength). This should be true of almost all proposals except perhaps the exploratory `fishing expeditions.’ If a hypothesis is not completely specified (i.e. has parameters with uncertain values), one might be able to perform simulations using a plausible distribution of parameters values (`predictive’ simulations?). If such power calculations were included as standard in most X-ray telescope proposals, the selection panel would have a valuable piece of additional information on which to base their judgements. Power calculations are a staple of experimental design in fields such as medical research, where sample sizes are set at a level that gives a reasonable probability of detecting the effect being sought, if it is real.

Unfortunately, there is no mechanism for properly reviewing the success of completed observing proposals, and so systematic over- or under-estimates in the exposure times may not be immediately apparent. The fact that so many observations do provide interesting results may tell us more about the richness of Nature (providing unplanned results), or the ingenuity of the observers to make use of the available data, than the design goals of the proposal.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/the-burden-of-reviewers/feed/ 11