The AstroStat Slog

Borel Cantelli Lemma for the Gaussian World

Dec 3rd, 2008| 12:31 am | Posted by hlee

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

I believe that I live in the RANDOM world. It is not necessarily always Gaussian but with large probability it looks like Gaussian thanks to Large Sample Theory. Here’s the question; “Do astronomers believe the Borel Cantelli Lemma (BCL) for their Gaussian world? And their bottom line of adopting Gaussian almost all occasions/experiments/data analysis is to prove this lemma for the Gaussian world?” Otherwise, one would like to be more cautious and would reason more before the chi-square goodness of fit methods are adopted. At least, I think that one should not claim that their chi-square methods are statistically rigorous, nor statistically sophisticated — for me, astronomically rigorous and sophisticated seems adequate, but no one say so. Probably, saying “statistically rigorous” is an effort of avoiding self praising and a helpless attribution to statistics. Truly, their data processing strategies are very elaborated and difficult to understand. I don’t see why under the name of statistics, astronomers praise their beautiful and complex data set and its analysis results. Often times, I stop for a breath to find an answer for why a simple chi-square goodness of fit method is claimed to be statistically rigorous while I only see the complexity of data handling given prior to the feed into the chi-square function.

The reason of my request for this one step backward prior to the chi-square method is that astronomer’s Gaussian world is only a part of multi-distributional universes, each of which has non negative probability measure.^[3] Despite the relatively large probability, the Gaussian world is just one realization from the set of distribution families. It is not an almost sure observation. Therefore, there is no need of diving into those chi-square fitting methods intrinsically assuming Gaussian, particularly when one knows exact data distributions like Poisson photon counts.

This ordeal of the chi-square method being called statistically rigorous gives me an impression that astronomers are under a mission of proving the grand challenge by providing as many their fitting results as possible based on the Gaussian assumption. This grand challenge is proving Borel-Cantelli Lemma empirically for the Gaussian world or in extension,

Based on the consensus that astronomical experiments and observations (A_i) occur in the Gaussian world and their frequency increase rapidly (i=1,…,n where n goes to infinity), for every experiment and observation (iid), by showing $$\sum_{i=1}^\infty P(A_i) =\infty,$$ the grand challenge that P(A_n, i.o.)=1 or the Gaussian world is almost always expected from any experiments/observations, can be proven.

Collecting as many results based on the chi-square methods is a sufficient condition for this lemma. I didn’t mean to ridicule but I did a bit of exaggeration by saying “the grand challenge.” By all means, I’m serious and like to know why astronomers are almost obsessed with the chi-square methods and the Gaussian world. I want to think plainly that adopting a chi-square method blindly is just a tradition, not a grand challenge to prove P(Gaussian_n i.o.)=1. Luckily, analyzing data in the Gaussian world hasn’t confronted catastrophic scientific fallacy. “So, why bother to think about a robust method applicable in any type of distributional world?”

Fortunately, I sometimes see astronomers who are not interested in this grand challenge of proving the Borel Cantelli Lemma for the Gaussian world. They provoke the traditional chi-square methods with limited resources – lack of proper examples and supports. Please, don’t get me wrong. Although I praise them, I’m not asking every astronomer to be these outsiders. Statisticians need jobs!!! Nevertheless, a paragraph and a diagnostic plot, i.e. a short justifying discussion for the chi-square is very much appreciated to convey the idea that the Gaussian world is the right choice for your data analysis.

Lastly, I’d like to raise some questions. “How confident are you that residuals between observations and the model are normally distribution only with a dozen of data points and measurement errors?” “Is the least square fitting is only way to find the best fit for your data analysis?” “When you know the data distribution is skewed, are you willing to use Δ χ₂ for estimating σ since it is the only way Numerical Recipe offers to estimate the σ?” I know that people working on their project for many months and years. Making an appointment with folks at the statistical consulting center of your institution and spending an hour or so won’t delay your project. Those consultants may or may not confirm that the strategies of chi-square or least square fitting is the best and convenient way. You may think statistical consulting is wasting time because those consultants do not understand your problems. Yet, your patience will pay off. Either in the Gaussian or non-Gaussian world, you are putting a correct middle stone to build a complete and long lasting tower. You already laid precious corner stones.

It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[↩]
To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[↩]
This non negativity is an assumption, not philosophically nor mathematically proven. My experience tells me the existence of Poissian world so that P(Poisson world)>0 and therefore, P(Gaussian world)<1 in reality.[↩]

Tags: Borel Cantelli Lemma, CLT, families of distributions, gaussian, grand challenge, measure, non-Gaussian, probability, statisticians
Category: arXiv, Astro, Bad AstroStat, Cross-Cultural, Frequentist, Jargon, News, Quotes, Stat, Uncertainty | Comment (RSS) | Trackback

Borel Cantelli Lemma for the Gaussian World

Leave a comment

Admin

Recent Posts

Recent Comments

Category Cloud

Blogroll

Links