The AstroStat Slog » Schmitt http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 Kaplan-Meier Estimator (Equation of the Week) http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-kaplan-meier/ http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-kaplan-meier/#comments Wed, 09 Jul 2008 17:00:54 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=356 The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package.

Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be Li, i=1..N, and that they are ordered such that Li < Li+1 Then, it is easy to see that the fraction of sources above each Li can be written as the sequence

{ N-1, N-2, N-3, … 2, 1, 0}/N

The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than Lk is

S(L>L1) = (N-1)/N
S(L>L2) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L3) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N

S(L>Lk) = Πi=1..k (ni-1)/ni = (N-k)/N

where nk are the number of objects still remaining at luminosity level L ≥ Lk, and at each stage one object is decremented to account for the drop in the sample size.

Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is

S(L>Lk) = Πi=1..k (ni-ci)/ni

where ci are the number of objects that drop out between Li-1 and Li.

Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-kaplan-meier/feed/ 13