Kaplan-Meier Estimator (Equation of the Week)

vlk — Wed, 09 Jul 2008 17:00:54 +0000

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package.

Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be L_i, i=1..N, and that they are ordered such that L_i < L_i+1 Then, it is easy to see that the fraction of sources above each L_i can be written as the sequence

{ N-1, N-2, N-3, … 2, 1, 0}/N

The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than L_k is

S(L>L₁) = (N-1)/N
S(L>L₂) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L₃) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N
…
S(L>L_k) = Π_i=1..k (n_i-1)/n_i = (N-k)/N

where n_k are the number of objects still remaining at luminosity level L ≥ L_k, and at each stage one object is decremented to account for the drop in the sample size.

Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is

S(L>L_k) = Π_i=1..k (n_i-c_i)/n_i

where c_i are the number of objects that drop out between L_i-1 and L_i.

Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].

The AstroStat Slog » Schmitt

Kaplan-Meier Estimator (Equation of the Week)