Books – a boring title

I have been observing some sorts of misconception about statistics and statistical nomenclature evolution in astronomy, which I believe, are attributed to the lack of references in the astronomical society. There are some textbooks designed for junior/senior science and engineering students, which are likely unknown to astronomers. Example-wise, these books are not suitable, to my knowledge. Although I never expect astronomers to learn standard graduate (mathematical) statistics textbooks, I do wish astronomers go beyond Numerical Recipes (W. H. Press, S. A. Teukolsky, W. T. Vetterling, & B. P. Flannery) and Error Data Reduction and Analysis for the Physical Sciences (P. R. Bevington & D. K. Robinson). Here are some good ones written by astronomers, engineers, and statisticians:

The motivation of writing this posting was originated to Vinay’s recommendation: Practical Statistics for Astronomers (J.V.Wall and C.R.Jenkins), which provides many statistical insights and caveats that astronomers tend to ignore. Without looking at the error distribution and the properties of data, astronomers jump into chi-square and correlation. If someone reads the book, he/she will be careful on adopting statistics of common practice in astronomy, developed many decades ago, and founded on strong assumptions, not compatible with modern data sets. The book addresses many concerns that have been growing in my mind for astronomers and introduces various statistical methods applicable in astronomy.

The view points of astronomers without in-class statistics education but with full readership of this book, would be different from mine. The book mentioned unbiasedness, consistency, closedness, and robustness of statistics, which normally are not discussed nor proved in astronomy papers. Therefore, those readers may miss the insights, caveats, and contents-between-the-lines of the book, which I care about. To reduce such gap, as for quick and easy understanding of classical statistics, I recommend Cartoon Guide to Statistics (Larry Gonick, Woollcott Smith Business & Investing Collins) as a first step. This cartoon book enhances fundamentals in statistics only with fun and a friendly manner, and provides everything that rudimentary textbooks offer.

If someone wants to know beyond classical statistics (so called frequentist statistics) and likes to know popular Bayesian statistics, astronomy professor Phil Gregory’s Bayesian Logical Data Analysis for the Physical Sciences is recommended. If one likes to know little bit more on the modern statistics of frequentists and Bayesians, All of Statistics (Larry Wasserman) is recommended. I realize that textbooks for non-statistics students are too thick to go through in a short time (The book for senior engineering students at Penn State I used for teaching was Probability and Statistics for Engineering and the Sciences by Jay. L Devore, 4th and 5th edition and it was about 600 pages. The current edition is 736 pages). One of well received textbooks for graduate students in electrical engineering is Probability, Random Variables and Stochastic Processes (A. Papoulis & S.U. Pillai). I remember the book offers a rather less abstract definition of measure and practical examples (Personally, Hermite polynomials was useful from the book).

For a casual reading about statistics and its 20th century history, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (D. Salsburg) is quite nice.

Statistics is not just for best fit analysis and error bars. It is a wonderful telescope extracts correct information when it is operated carefully to the right target by the manual. It gets rid of atmospheric and other blurring factors when statistics is understood righteously. It is not a black box nor a magic, as many people think.

The era of treating everything gaussian is over decades ago. Because of the central limit theorem and the delta method (a good example is log-transformation), many statistics asymptotically follows the normal (gaussian) distribution but there are various families of distributions. Because of possible bias in the chi-square method, the error bar cannot guarantee the appointed coverage, like 95%. There are also nonparametric statistics, known for robustness, whereas it may be less efficient than statistics of distribution family assumption. Yet, it does not require model assumption. Also, Bayesian statistics works wonderfully if correct information on priors, suitable likelihood models, and computing powers for hierarchical models and numerical integration are provided.

Before jumping into the chi-square for fitting and testing at the same time, to prevent introducing bias, exploratory data analysis is required for better understanding data and for seeking a suitable statistic and its assumptions. The exploratory data analysis starts from simple scatter plots and box plots. A little statistical care for data and good interests in the truth of statistical methods are all I am asking for. I do wish that these books could assist the realization of my wishes.

[1.] Most of links to books are from but there is no personal affiliation to the company.

[2.] In addition to the previous posting on chi-square, what is so special about chi square in astronomy, I’d like to mention possible bias in chi-square fitting and testing. It is well known that utilizing the same data set for fitting, which results in parameter estimates so called in astronomy best fit values and error bars, and testing based on these parameter estimates brings out bias so that the best fit is biased from the true parameter value and the error bar does not match the aimed coverage. See the problem from Aneta’s an example of chi2 bias in fitting x-ray spectra

[3.] More book recommendation is welcome.

  1. vlk:

    A very nice and useful post, Hyunsook.
    In the interests of completeness, I should mention the third of the triumvirs that astronomers have generally relied on in the past, in addition to Bevington and NumRec: Statistical Methods in Experimental Physics, by Eadie et al. (1971). Peter Freeman swears by it.

    01-25-2008, 3:26 pm
  2. hlee:

    I must see the book to lift my prejudice then. It seems that the second edition was published after 35 years. Thanks!

    [After spending 30 minutes in the library] It’s a great book! I like to keep it near me for a quick reference. One book has many. However, I become skeptical at “one of triumvirs.” If so, my posting is useless. My impression is that the book could overwhelm young astronomers and drive them off from statistics. I am glad I didn’t know the book when I was studying astronomy.

    01-25-2008, 9:41 pm
  3. Mauro:

    Very interesting post.
    I suggest another “old” book: Statistical Astronomy, by R.J. Trumpler, H.F. Weaver, 1962, Dover.

    01-29-2008, 10:51 am
  4. hlee:

    Well..I spent 5 minutes or so in the library to scan the book. Statistical Astronomy, because of its age, needs a title makeover, like “Statistical methods for gaussian astronomical data without computer.” I’m very spoiled with modern tools so as to say that the techniques the book describes are not much discussed in statistics. Statistics evolved a lot over the years and is moving forward quickly. The book seems to rely heavily on gaussianity and small sample. Also, the astronomical examples seem to be already implemented in astronomical tools because I never confronted such examples last year, although I learned Jacobians and spherical coordinate representations from an old professor more than 10 years ago). Very likely I got a wrong impression and prejudice due to the fact that the library copy received no one’s touch ever since it was purchased more than 40 years ago. However, I’m very grateful to you that I learn there has been a quite long history in astrostatistics.

    There have been statistics references for astronomers according to their statistical needs, not lack of references.

    01-30-2008, 2:29 am
  5. Mauro:

    There are other two very “old” books “Lehrbuch der stellarstatistik” by E. von der Pahlen, F. Gondolatsch, L. Hufnagel, 1937, and “Stellar movements and the structure of the universe” by A.S. Eddington 1914. The two books are not useful for modern astronomy but they are a trace of what astronomers have do with statistics.

    01-30-2008, 5:33 am
  6. TomLoredo:

    To clarify re: Statistical Astronomy, it is a classic worth knowing about, not so much for general statistical analysis of data, but for statistical modeling of astronomical populations. For example, it’s the classic reference for the so-called fundamental equation for star counts (perhaps Trumpler and Weaver gave it that name).

    Mauro’s mention of Eddington is interesting in another respect: I don’t know if it’s covered in his 1914 book, but by that time he had written papers that introduced something like a shrinkage estimator—decades before Stein! This might make a neat paper for someone interested in the history of statistics, since shrinkage estimation is one of the key developments of 20th century statistics.

    02-07-2008, 5:15 pm
  7. Mauro:

    For the shrinkage estimator do you refer to this paper
    On a formula for correcting statistics for the effects of a known error of observation ? If yes, this estimator is widely used in the Eddington book.

    03-08-2008, 1:05 pm
  8. vlk:

    Wow, is that the original description of the eponymous Eddington Bias?! Fascinating to see it worked out. Astonishing to see it cited only 85 times. Astronomers, mark this for future reference [sic]: Eddington, A.S., 1913, MNRAS, 73, 359

    I have always looked upon it as a realization of Poisson statistics, but I notice that the treatment is in the Gaussian regime, and is quite general. It strikes me that it should work perfectly well in other situations, e.g., to non-parametrically deconvolve moderate resolution grating spectra, such as those from EUVE/*W or Chandra/LETG.

    03-10-2008, 3:03 am
  9. Simon Vaughan:

    Following Vinay’s and Hyunsook’s posts: there’s a second edition of ‘Statistical Methods in Experimental Physics’ (Eadie et al. 1971) with improve typesetting but most of the same material. (The original looks *very* dated because of the typesetting.) It is by only one of the original authors – F. James.

    There’s also a nice book ‘Statistical Data Analysis’ by Glen Cowan that covers a lot of the same material, but is again aimed at particle physicists.

    07-09-2008, 6:42 am
  10. hlee:

    Dear Simon,

    For high energy astrophysicists, books for particle physicists seem very useful and highly related. I’ll get the book later and read/scan. Thanks in bunch!

    I wish there are some astronomical data depositories where no data reduction is required but one can apply various statistical analyses to the data in the depository to learn and compare statistical methods. I always have troubles in data reduction because I’m lack in hand-on instructions of reducing data sets that are nowadays available from websites and virtual observatory. I see dozens of thousand points from an archive but papers reduce them to a few hundreds or thousands prior to statistical analysis. These reduced data sets are not available to a person like me.

    Once these reduced data sets from various astronomy divisions are available and put into a common depository; it’ll be useful for teaching statistics and data analysis to young astronomy students while exposing them various fields of astronomy. Also, it’ll lead up-to-date astrostatistics textbooks to be written so that there’s no need for astronomers to rely on books published 4-5 decades ago.

    07-14-2008, 11:29 am
  11. brianISU:

    Dear hlee,

    this website might be what you are looking for. I believe it is ran through NASA and University of Maryland. I used a data set here to try and understand the lifetime distribution of stars from supernova data as a class project. There seems to be many interesting data sets to glance through. I hope this helps.

    07-14-2008, 8:48 pm
  12. Simon Vaughan:

    Hyunsook, I agree entirely about the usefulness of a data repository. It would be really nice to have a range of data types (spectra, images, time series – from optical, X-ray, radio etc.) in common formats, from real observations of different sources, that can be used to demonstrate with or experiment on. Most of our ‘textbook’ examples are rather ancient!
    There are a few websites I use, but the range is rather limited. For time series, there is:

    – “A Sample of Astronomical Time Series” –

    The Astrostatistics summer school at PSU maintains a page with some datasets

    – “Astronomical datasets for statistical analysis” –

    The UK Swift data centre hosts a light curve repository where you can download ASCII time series for virtually any GRB observed with the Swift/XRT

    - “Swift/XRT GRB lightcurve repository”

    I wonder if any AstroStat readers know of any more like this…?

    07-15-2008, 5:27 am
Leave a comment