Survival Analysis: A Primer

hlee — Tue, 08 Jul 2008 23:27:38 +0000

Astronomers confront with various censored and truncated data. Often these types of data are called after famous scientists who generalized them, like Eddington bias. When these censored or truncated data become the subject of study in statistics, instead of naming them, statisticians try to model them so that the uncertainty can be quantified. This area is called survival analysis. If your library has The American Statistician subscription and you are an astronomer handles censored or truncated data sets, this primer would be useful for briefly conceptualizing statistics jargon in survival analysis and for characterizing uncertainties residing in your data.

Survival Analysis: A Primer by David A. Freedman
The American Statistician, May 2008, Vol. 62, No.2, pp. 110-119

This article explains the basics of survival analysis and adds criticisms on previously conducted studies. Since the given examples are from medical studies, astronomers may not be interested in reading the whole article. Nonetheless, Freedman offers the definitions in survival analysis such as survival function, hazard rate, the Kaplan-Meier estimator, the proportional hazard model with clarity and conciseness. For example, if τ (a positive random variable indicating the waiting time for failure) is Weibull, the hazard rate takes an exact form of the celebrated power law in astronomy (I think modification of pdfs reflecting censoring and truncation may lead more robust results compared to fitting power laws unless parameters in power laws have astrophysical implications and survival analysis approaches cannot perform the same parametrization).

Commonality between power laws and Pareto distributions and frequent appearance of power laws in astronomical journals drives some anticipation of frequent applications of survival analysis to astronomical data; on the contrary, there are not many.

Though there are more, here are a few references relevant to survival analysis, that utilized examples from astronomy or appeared astronomical journals:

Nonparametric Methods for Doubly Truncated Data by B Efron and V Petrosian. (subscription required)
Journal of the American Statistical Association, Vol. 94, pp. 824-834 (1999)
Survival Analysis of the Gamma-Ray Burst Data by B Efron and V Petrosian. (subscription required)
Journal of the American Statistical Association, Vol. 89, pp. 452-464 (1994)
A simple test of independence for truncated data with applications to redshift surveys by B Efron and V Petrosian
ApJ, Vol. 399, pp.345-352 (1992)
Statistical methods for astronomical data with upper limits. I – Univariate distributions by Feigelson and Nelson
ApJ, Vol. 293, pp.192-206 (1985)
Nonparametric Estimation of the Slope of a Truncated Regression by Bhattacharya, Chernoff, and Yang (subscription required)
The Annals of Statistics, Vol. 11(2), pp. 505-514 (1983)

Note that these papers only dealt particular statistical interests with an general introduction about survival analysis and definitions of estimators based on relatively small sample size data sets. Facing massive survey data with truncation and heterogeneity in measurement errors in astronomy could open a new era of survival analysis.

Lastly, there are studies regarding Pareto distribution some of which are presented in the slog. (Use “search” with Pareto. More statistical papers on survival analysis in astronomy are welcome to be added; please, inform me.)

[ArXiv] Pareto Distribution

hlee — Thu, 03 Apr 2008 20:55:04 +0000

Astronomy is ruled by Gaussian distribution with a Poisson distribution duchy. From time to time, ranks are awarded to other distributions without their own territories to be governed independently. Among these distributions, Pareto deserves a high rank. There is a preprint of this week on the Pareto distribution:

On the Truncated Pareto Distribution with applications by Zaninetti and Ferraro [astro-ph:0804.0308]

From the abstract:

This note deals with an application of the Pareto distribution to astrophysics and more precisely to the statistical analysis of mass of stars and of diameters of asteroids. In particular a comparison between the usual Pareto distribution and its truncated version is presented.

The paper introduces the pdf, cdf, mean, variance, higher moments, and survival function of the (truncated) Pareto distribution with applications to Star masses from the Hipparcos data^[1] and asteroid sizes, and simulations of primeval nebula^[2]. It concludes that the truncated Pareto works better than the usual Pareto. The Pareto distribution is simple and intuitive.

ps. Not many astronomy papers cite papers from recent statistical publications. I witness that although the most of astronomical papers have no needs for citing papers in statistics, if they do, they tend to have references from four to five decades ago among which books were revised in 90′s or later and articles of modern perspectives are available (exceptions are seminal papers that introduced statistics to the community like EM algorithm). It is quite encouraging to see an article from JASA 2006 was cited in [astro-ph:0804.0308]

Pareto or power law seems not a good model to fit star masses
Mass accretion observes probabilistic model, I guess

The AstroStat Slog » truncated

Survival Analysis: A Primer

[ArXiv] Pareto Distribution