The AstroStat Slog » probability

A short note on Probability for astronomers

hlee — Mon, 28 Dec 2009 03:13:02 +0000

I often feel irksome whenever I see a function being normalized over a feasible parameter space and it being used as a probability density function (pdf) for further statistical inference. In order to be a suitable pdf, normalization has to be done over a measurable space not over a feasible space. Such practice often yields biased best fits (biased estimators) and improper error bars. On the other hand, validating a measurable space under physics seems complicated. To be precise, we often lost in translation.

When I was teaching statistics, despite undergraduate courses, there were both undergraduate and graduate students of various fields except astrophysics majors. I wondered why they were not encouraged to take some basic statistics whereas they were encouraged to take some computer science courses. As there are many astronomers good at programming and designing tools, I’m sure that recommending students to take statistics courses will renovate astronomical data analysis procedures (beyond Bevington’s book) and hind theories (statistics and mathematics per se, not physics laws).

Here’s an interesting lecture for developing a curriculum for the new era in computer science and why the basic probability theory and statistics is important to raise versatile computer scientists. It could be a bit out dated now because I saw it several months ago.

About a little more than the half way through the lecture, he emphasizes that probability course partaking the computer science curriculum. I wonder any astronomy professor has similar arguments and stresses for any needs of basic probability theories to be learned among young future astrophysicists in order to prevent many statistics misuses appearing in astronomical literature. Particularly confusions between fitting (estimating) and inference (both model assessment and uncertainty quantification) are frequently observed in literature where authors claim their superior statistics and statistical data analysis. I personally sometimes attribute this confusion to the lack of distinction between what is random and what is deterministic, or strong believe in their observed and processed data absent from errors and probabilistic nature.

Many introductory books introduce very interesting problems many of which have some historical origins to introduce probability theories (many anecdotes). One can check out the very basics, probability axioms, and measurable function from wikipedia. With examples, probability is high school or lower level math that you already know but with jargon you’ll like to recite lexicons many times so that you are get used to foundations, basics, and their theories.

We often mention measurable to discuss random variables, uncertainties, and distributions without verbosity. “Assume measurable space … ” saves multiple paragraphs in an article and changes the structure of writing. This short adjective implies so many assumptions depending on statistical models and equations that you are using for best fits and error bars.

Consider a LF, that is truncated due to observational limits. The common practice I saw is drawing a histogram in a way that the adaptive binning makes the overall shape reflecting a partial bell shape curve. Thanks to its smoothed look, scientists impose a gaussian curve to partially observed data and find parameter estimates that determine the shape of this gaussian curve. There is no imputation step to fake unobserved points to comprise the full probability space. The parameter space of gaussian curves frequently does not coincide with the physically feasible space; however, such discrepancy is rarely discussed in astronomical literature and subsequent biased results look like a taboo.

Although astronomers emphasize the importance of uncertainties, factorization nor stratification of uncertainties has never been clear (model uncertainty, systematic uncertainty or bias, statistical uncertainties or variance). Hierarchical relationships or correlations among these different uncertainties are never addressed in a full measure. Basics of probability theory and the understanding of random variables would help to characterize uncertainties both in mathematical sense and astrophysical sense. This knowledge also assist appropriate quantification of these characterized uncertainties.

Statistical models are rather simple compared to models of astrophysics. However, statistics is the science of understanding uncertainties and randomness and therefore, some strategies of transcribing from complicated astrophysical models into statistical models, in order to reflect the probabilistic nature of observed (or parameters, for Bayesian modeling), are necessary. Both raw or processed data manifest the behavior of random variables. Their underlying processes determine not only physics models but also statistical models written in terms of random variables and the link functions connecting physics and uncertainties. To my best understanding, bridging and inventing statistical models for astrophysics researches seem tough due to the lack of awareness of basics of probability theory.

Once I had a chance to observe a Decadal survey meeting, which covered so diverse areas in astronomy. They discussed new projects, advancing current projects, career developments, and a little bit about educating professional astronomers apart from public reach (which often receives more importance than university curriculum. I also believe that wide spread public awareness of astronomy is very important). What I missed while I observing the meeting is that interdisciplinary knowledge transferring efforts to broaden the field of astronomy and astrophysics nor curriculum design ideas. Because of its long history, I thought astronomy is a science of everything. Marching a path for a long time made astronomy more or less the most isolated and exclusive science.

Perhaps asking astronomy majors taking multiple statistics courses is too burdensome; therefore being taught by faculty who are specialized in (statistical) data analysis organizes a data analysis course and incorporates several hours of basic probability is more realistic and what I anticipate. With a few hours of bringing fundamental notions in random variables and probability, the claims of “statistical rigorous methods and powerful results” will become more appropriate. Currently, statistics is science but in astronomy literature, it looks more or less like an adjective that modify methods and results like “powerful”, “superior”, “excellent”, “better”, “useful,” and so on. Basics of probability is easily incorporated into introduction of algorithms in designing experiments and optimization methods, which are currently used in a brute force fashion^[1].

Occasionally, I see gems from arxiv written by astronomers. Their expertise in astronomy and their interest in statistics has produced intriguing accounts for statistically rigorous data analysis and inference procedures. Their papers includes explanation of fundamentals of statistics and probability more appropriate to astronomers than statistics textbooks for scientists and engineers of different fields. I wish more astronomers join this venture knowing basics and diversities of statistics to rectify many unconscious misuses of statistics while they argue that their choice of statistics is the most powerful one thanks to plausible results.

What I mean by a brute force fashion is that trying all methods listed in the software manual, and then later, stating that the method A gave most plausible values that matches with data in a scatter plot

Borel Cantelli Lemma for the Gaussian World

hlee — Wed, 03 Dec 2008 04:31:29 +0000

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

I believe that I live in the RANDOM world. It is not necessarily always Gaussian but with large probability it looks like Gaussian thanks to Large Sample Theory. Here’s the question; “Do astronomers believe the Borel Cantelli Lemma (BCL) for their Gaussian world? And their bottom line of adopting Gaussian almost all occasions/experiments/data analysis is to prove this lemma for the Gaussian world?” Otherwise, one would like to be more cautious and would reason more before the chi-square goodness of fit methods are adopted. At least, I think that one should not claim that their chi-square methods are statistically rigorous, nor statistically sophisticated — for me, astronomically rigorous and sophisticated seems adequate, but no one say so. Probably, saying “statistically rigorous” is an effort of avoiding self praising and a helpless attribution to statistics. Truly, their data processing strategies are very elaborated and difficult to understand. I don’t see why under the name of statistics, astronomers praise their beautiful and complex data set and its analysis results. Often times, I stop for a breath to find an answer for why a simple chi-square goodness of fit method is claimed to be statistically rigorous while I only see the complexity of data handling given prior to the feed into the chi-square function.

The reason of my request for this one step backward prior to the chi-square method is that astronomer’s Gaussian world is only a part of multi-distributional universes, each of which has non negative probability measure.^[3] Despite the relatively large probability, the Gaussian world is just one realization from the set of distribution families. It is not an almost sure observation. Therefore, there is no need of diving into those chi-square fitting methods intrinsically assuming Gaussian, particularly when one knows exact data distributions like Poisson photon counts.

This ordeal of the chi-square method being called statistically rigorous gives me an impression that astronomers are under a mission of proving the grand challenge by providing as many their fitting results as possible based on the Gaussian assumption. This grand challenge is proving Borel-Cantelli Lemma empirically for the Gaussian world or in extension,

Based on the consensus that astronomical experiments and observations (A_i) occur in the Gaussian world and their frequency increase rapidly (i=1,…,n where n goes to infinity), for every experiment and observation (iid), by showing $$\sum_{i=1}^\infty P(A_i) =\infty,$$ the grand challenge that P(A_n, i.o.)=1 or the Gaussian world is almost always expected from any experiments/observations, can be proven.

Collecting as many results based on the chi-square methods is a sufficient condition for this lemma. I didn’t mean to ridicule but I did a bit of exaggeration by saying “the grand challenge.” By all means, I’m serious and like to know why astronomers are almost obsessed with the chi-square methods and the Gaussian world. I want to think plainly that adopting a chi-square method blindly is just a tradition, not a grand challenge to prove P(Gaussian_n i.o.)=1. Luckily, analyzing data in the Gaussian world hasn’t confronted catastrophic scientific fallacy. “So, why bother to think about a robust method applicable in any type of distributional world?”

Fortunately, I sometimes see astronomers who are not interested in this grand challenge of proving the Borel Cantelli Lemma for the Gaussian world. They provoke the traditional chi-square methods with limited resources – lack of proper examples and supports. Please, don’t get me wrong. Although I praise them, I’m not asking every astronomer to be these outsiders. Statisticians need jobs!!! Nevertheless, a paragraph and a diagnostic plot, i.e. a short justifying discussion for the chi-square is very much appreciated to convey the idea that the Gaussian world is the right choice for your data analysis.

Lastly, I’d like to raise some questions. “How confident are you that residuals between observations and the model are normally distribution only with a dozen of data points and measurement errors?” “Is the least square fitting is only way to find the best fit for your data analysis?” “When you know the data distribution is skewed, are you willing to use Δ χ₂ for estimating σ since it is the only way Numerical Recipe offers to estimate the σ?” I know that people working on their project for many months and years. Making an appointment with folks at the statistical consulting center of your institution and spending an hour or so won’t delay your project. Those consultants may or may not confirm that the strategies of chi-square or least square fitting is the best and convenient way. You may think statistical consulting is wasting time because those consultants do not understand your problems. Yet, your patience will pay off. Either in the Gaussian or non-Gaussian world, you are putting a correct middle stone to build a complete and long lasting tower. You already laid precious corner stones.

It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.
To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.
This non negativity is an assumption, not philosophically nor mathematically proven. My experience tells me the existence of Poissian world so that P(Poisson world)>0 and therefore, P(Gaussian world)<1 in reality.

A lecture note of great utility

hlee — Wed, 27 Aug 2008 18:35:14 +0000

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract: These lectures deal with the problem of inductive inference, that is, the problem of reasoning under conditions of incomplete information. Is there a general method for handling uncertainty? Or, at least, are there rules that could in principle be followed by an ideally rational mind when discussing scientific matters? What makes one statement more plausible than another? How much more plausible? And then, when new information is acquired how do we change our minds? Or, to put it differently, are there rules for learning? Are there rules for processing information that are objective and consistent? Are they unique? And, come to think of it, what, after all, is information? It is clear that data contains or conveys information, but what does this precisely mean? Can information be conveyed in other ways? Is information physical? Can we measure amounts of information? Do we need to? Our goal is to develop the main tools for inductive inference–probability and entropy–from a thoroughly Bayesian point of view and to illustrate their use in physics with examples borrowed from the foundations of classical statistical physics.

[ArXiv] 4th week, Apr. 2008

hlee — Sun, 27 Apr 2008 15:29:48 +0000

The last paper in the list discusses MCMC for time series analysis, applied to sunspot data. There are six additional papers about statistics and data analysis from the week.

[astro-ph:0804.2904]M. Cruz et al.
The CMB cold spot: texture, cluster or void?
[astro-ph:0804.2917] Z. Zhu, M. Sereno
Testing the DGP model with gravitational lensing statistics
[astro-ph:0804.3390] Valkenburg, Krauss, & Hamann
Effects of Prior Assumptions on Bayesian Estimates of Inflation Parameters, and the expected Gravitational Waves Signal from Inflation
[astro-ph:0804.3413] N.Ball et al.
Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX (Another related publication [astro-ph:0804.3417])
[astro-ph:0804.3471] M. Cirasuolo et al.
A new measurement of the evolving near-infrared galaxy luminosity function out to z~4: a continuing challenge to theoretical models of galaxy formation
[astro-ph:0804.3475] A.D. Mackey et al.
Multiple stellar populations in three rich Large Magellanic Cloud star clusters
[stat.ME:0804.3853] C. R\”over , R. Meyer, N. Christensen
Modelling coloured noise (MCMC & sunspot data)

Statistics is the study of uncertainty

hlee — Mon, 31 Mar 2008 03:16:18 +0000

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from The Philosophy of Statistics by Dennis V. Lindley (2000). The Statistician, 49(3), pp.293-337. The article is a very good read (no theorems and their proofs. It does not begin with “Assume that …”).

The author starts the article by posing Statistics is the study of uncertainty and the rest is very agreeable as the quotes given above and below.

Because you do not know how to measure the distance to our moon, it does not follow that you do not believe in the existence of a distance to it. Scientists have spent much effort on the accurate determination of length because they were convinced that the concept of distance made sense in terms of krypton light. Similarly, it seems reasonable to attempt the measurement of uncertainty.

significance level – the probability of some aspect of the data, given H is true
probability – your probability of H, given the data

Many people, especially in scientific matters, think that their statements are objective, expressed through the probability, and are alarmed by the intrusion of subjectivity. Their alarm can be alleviated by considering reality and how that reality is reflected in the probability calculus.

I have often seen the stupid question posed ‘what is an appropriate prior for the variance σ² of a normal (data) density?’ It is stupid because σ is just a Greek letter.

The statistician’s role is to articulate the client’s preferences in the form of a utility function, just as it is to express their uncertainty through probability,

where clients can be replaced with astronomers.

Upon accepting that statistics is the study of uncertainty, we’d better think about what this uncertainty is. Depending on the description of uncertainty, or the probability, the uncertainty quantification would change. As the author mentioned, statisticians formulate the clients’ uncertainty transcription, which I think astronomers should take the responsibility of. Nevertheless, I become to have a notion that astronomers do not care the subtleness in uncertainties. Generally, the probability model of this uncertainty is built on the independent property and at some point is approximated to Gaussian distribution. Yet, there are changes in this tradition and frequently I observe from arXiv:astro-ph that astronomers are utilizing Bayesian modeling for observed phenomenon and reflecting non gaussian uncertainty.

I heard that the effort on visualizing uncertainty is under progress. Prior to codifying, I wish those astronomers to be careful on the meaning of the uncertainty and the choice of statistics, i.e., modeling the uncertainty.

An alternative to MCMC?

vlk — Sun, 19 Aug 2007 04:31:09 +0000

I think of Markov-Chain Monte Carlo (MCMC) as a kind of directed staggering about, a random walk with a goal. (Sort of like driving in Boston.) It is conceptually simple to grasp as a way to explore the posterior probability distribution of the parameters of interest by sampling only where it is worth sampling from. Thus, a major savings from brute force Monte Carlo, and far more robust than downhill fitting programs. It also gives you the error bar on the parameter for free. What could be better?

Feroz & Hobson (2007, arXiv:0704.3704) describe a technique called Nested Sampling (Skilling 2004), one that could give MCMC a run for its money. It takes the one inefficient part of MCMC — the burn-in phase — and turns that into a virtue. The way it seems to work is to keep track of how the parameter space is traversed as the model parameters {theta} reach the mode of the posterior, and to take the sequence of likelihoods thus obtained L(theta), and turn it around to get theta(L). Neat.

Two big (computational) problems that I see are (1) the calculation of theta(L), and (2) the sampling to discard the tail of L(theta). The former, it seems to me, becomes intractable exactly when the likelihood surface gets complicated. The latter, again, it seems you have to run through just as many iterations as in MCMC to get a decent sample size. Of course, if you have a good theta(L), it does seem to be an improvement over MCMC in that you won’t need to run the chains multiple times to make sure you catch all the modes.

I think the main advantage of MCMC is that it produces and keeps track of marginalized posteriors for each parameter, whereas in this case, you have to essentially keep a full list of samples from the joint posterior and then marginalize over it yourself. The larger the sample size, the harder this gets, and in fact it is a bit difficult to tell whether the nested sampling method is competing with MCMC or Monte Carlo integration.

Is there any reason why this should not be combined with MCMC? i.e., can we use nested sampling from the burn-in phase to figure out the proposal distributions for Metropolis or Metropolis-Hastings in situ, and get the best of both worlds?

An excerpt from “A Conversation with Leo Breiman”

hlee — Fri, 02 Mar 2007 15:21:41 +0000

Leo Breiman (1928-2005) was one of the most dominant statisticians from the 20th century. He was well known for his textbook in probability theory as well as his contributions to the machine learning, such as CART (Classification and Regression Tree), bagging (bootstrap aggregation), and Random Forest. He was the founding father of statistical machine learning. His works can be found from http://www.stat.berkeley.edu/~breiman/

An excerpt from “A Conversation with Leo Breiman,” from Statistical Science, by Richard Olshen (2001), 16(2), pp. 184–198, casts a second thought on the direction of statistical researches:

Alice in Wonderland. That is, I knew what was going on out in industry and government in terms of uses of statistics, but what was going on in academic research seemed light years away. It was proceeding as though it were some branch of abstract mathematics. One of our senior faculty members said a while back, “We have to keep alive the spirit of Wald.” But before the good old days of Wald and the divorce of statistics from data, there were the good old days of Fisher, who believed that statistics existed for the purposes of prediction and explanation and working with data.

He foresaw where statistics were heading. His more interesting critics can be found from Statistical Modeling: The Two Cultures, by Leo Breiman (2001), Statistical Science, Vol. 16, No. 3. (Aug., 2001), pp. 199-215.

The reason for presenting this excerpt is to emphasize that the efforts from CHASC is following the wishes of Leo Breiman’s: statistical researches for (astronomical) data.

[Oct. 14, 2008] The link to Statistical Modeling: The Two Cultures, by Leo Breiman (2001), Statistical Science, Vol. 16, No. 3. (Aug., 2001), pp. 199-215. is updated to a pdf file.