The AstroStat Slog » Bayesian Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 coin toss with a twist Sun, 26 Dec 2010 22:27:50 +0000 vlk Here’s a cool illustration of how to use Bayesian analysis in the limit of very little data, when inferences are necessarily dominated by the prior. The question, via Tom Moertel, is: suppose I tell you that a coin always comes up heads, and you proceed to toss it and it does come up heads — how much more do you believe me now?

He also has the answer worked out in detail.

(h/t Doug Burke)

]]> 0
From Quantile Probability and Statistical Data Modeling Sat, 21 Nov 2009 10:06:24 +0000 hlee by Emanuel Parzen in Statistical Science 2004, Vol 19(4), pp.652-662 JSTOR

I teach that statistics (done the quantile way) can be simultaneously frequentist and Bayesian, confidence intervals and credible intervals, parametric and nonparametric, continuous and discrete data. My first step in data modeling is identification of parametric models; if they do not fit, we provide nonparametric models for fitting and simulating the data. The practice of statistics, and the modeling (mining) of data, can be elegant and provide intellectual and sensual pleasure. Fitting distributions to data is an important industry in which statisticians are not yet vendors. We believe that unifications of statistical methods can enable us to advertise, “What is your question? Statisticians have answers!”

I couldn’t help liking this paragraph because of its bitter-sweetness. I hope you appreciate it as much as I did.

]]> 0
The chance that A has nukes is p% Fri, 23 Oct 2009 17:26:07 +0000 hlee I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.

First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first.

From the frequentists perspective, computing the chance of e coli infection is investigating sample of lettuce and counts species that are infected: n is the number of infected species and N is the total sample size. 1% means one among 100. Such percentage reports and their uncertainties are very familiar scene during any election periods for everyone. From Bayesian perspective, Pr(p|D) ~ L(D|p) pi(p), properly choosing likelihoods and priors, one can estimate the chance of e coli infection and uncertainty. Understanding of sample species and a prior knowledge helps to determine likelihoods and priors.

How about the chance that country A has nukes? Do we have replicates of country A so that a committee investigate each country and count ones with nukes to compute the chance? We cannot do that. Traditional frequentist approach, based on counting, does not work here to compute the chance. Either using fiducial likelihood approach or Bayesian approach, i.e. carefully choosing a likelihood function adequately (priors are only for Bayesian) allows one to compuate such probability of interest. In other words, those computed chances highly depend on the choice of model and are very subjective.

So, here’s my concern. It seems like that astronomers want to know the chance of their spectral data being described by a model (A*B+C)*D (each letter stands for one of models such as listed in Sherpa Models). This is more like computing the chance of having nukes in country A, not counting frequencies of the event occurrence. On the other hand, p-value from goodness of fit tests, LRTs, or F-tests is a number from the traditional frequentists’ counting approach. In other words, p-value accounts for, under the null hypothesis (the (A*B+C)*D model is the right choice so that residuals are Gaussian), how many times one will observe the event (say, reduced chi^2 >1.2) if the experiments are done N times. The problem is that we only have one time experiment and that one spectrum to verify the (A*B+C)*D is true. Goodness of fit or LRT only tells the goodness or the badness of the model, not the statistically and objectively quantified chance.

In order to know the chance of the model (A*B+C)*D, like A has nuke with p%, one should not rely on p-values. If you have multiple models, one could compute pairwise relative chances i.e. odds ratios, or Bayes factors. However this does not provide the uncertainty of the chance (astronomers have the tendency of reporting uncertainties of any point estimates even if the procedure is statistically meaningless and that quantified uncertainty is not statistical uncertainty, as in using delta chi^2=1 to report 68% confidence intervals). There are various model selection criteria that cater various conditions embedded in data to make a right model choice among other candidate models. In addition, post-inference for astronomical models is yet a very difficult problem.

In order to report the righteous chance of (A*B+C)*D requires more elaborated statistical modeling, always brings some fierce discussions between frequentists and Bayesian because of priors and likelihoods. Although it can be very boring process, I want astronomers to leave the problem to statisticians instead of using inappropriate test statistics and making creative interpretation of statistics.

Please, keep this question in your mind when you report probability: what kind of chance are you computing? The chance of e coli infection? Or the chance that A has nukes? Make sure to understand that p-values from data analysis packages does not tell you that the chance the model (A*B+C)*D is (one minus p-value)%. You don’t want to report one minus p-value from a chi-square test statistic as the chance that A has nukes.

]]> 0
[MADS] logistic regression Tue, 13 Oct 2009 20:15:08 +0000 hlee Although a bit of time has elapsed since my post space weather, saying that logistic regression is used for prediction, it looks like still true that logistic regression is rarely used in astronomy. Otherwise, it could have been used for the similar purpose not under the same statistical jargon but under the Bayesian modeling procedures.

Maybe, some astronomers want to check out this versatile statistical method, wiki:logistic regression to see whether they can fit their data to this statistical method in order to model/predict observation rates, unobserved rates, undetected rates, detected rates, absorbed rates, and so on in terms of what are observed and additional/external observations, knowledge, and theories. I wonder what would it be like if the following is fit using logistic regression: detection limits, Eddington bias, incompleteness, absorption, differential emission measures, opacity, etc plus brute force Monte Carlo simulations emulating likely data to be fit. Then, responses are the probability of observed vs not observed as a function of redshift, magnitudes, counts, flux, wavelength/frequency, and other measurable variables or latent variables.

My simple reasoning that astronomers observe partially and they will never have complete sample, has imposed a prejudice that logistic regression would appear in astronomical literature rather frequently. Against my bet, it was [MADS]. All stat softwares have packages and modules for logistic regression; therefore, you have a data set, application is very straight forward.

Although logistic regression models are given in many good tutorials, literature, or websites, it might be useful to have a simple but intuitive form of logistic regression for sloggers.

When you have binary responses, metal poor star (Y=1) vs. metal rich star (Y=2), and predictors, such as colors, distance, parallax, precision, and other columns in catalogs (X is a matrix comprised of these variables),
logit(Pr(Y=1|X))=\log \frac{Pr(Y=1|X)}{1-Pr(Y=1|X)} = \beta_o+{\mathbf X^T \beta} .
As astronomers fit a linear regression model to get the intercept and slope, the same approach is applied to get intercepts and coefficients of logistic regression models.

]]> 0
[Books] Bayesian Computations Fri, 11 Sep 2009 20:40:23 +0000 hlee A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories.

Bayesian Computation with R
Author:Jim Albert
Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in Robust Statistics, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Author: J. Marin and C.P.Robert
Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of cosmic, the book used cosmological. I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the Metropolis-within-Gibbs algorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, detailed balance equation and random walk.

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the semiparametric perspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as \sum_{i=1}^k p_i f(x|\theta_i) (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributions f_j does make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical Methods Robert, C. and Casella, G. (2004)
Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be Bayesian Computation with Python, IDL, Matlab, Java, or C/C++ for those who never intend to use R. By the way, for Mathematica users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

]]> 1
2010 SBSS STUDENT PAPER COMPETITION Fri, 21 Aug 2009 03:14:46 +0000 chasc The Section on Bayesian Statistical Science (SBSS) of the American Statistical Association (ASA) would like to announce its 2010 student paper competition.  Winners of the competition will receive partial support for attending the 2010 Joint Statistical Meetings (JSM) in Vancouver, BC.


The candidate must be a member of SBSS (URL: or ISBA (International Society for Bayesian Analysis). Those candidates who have previously received travel support from SBSS are not eligible to participate. In addition, the candidate must be a full-time student (undergraduate, Masters, or Ph.D.) on or after September 1, 2009.

A manuscript, suitable for journal submission, is required for entry. The candidate must be the lead author on the paper, and hold the primary responsibility for the research and write-up.

The candidate must have separately submitted an abstract for JSM 2010 through the regular abstract submission process,  to present applied, computational, or theoretical Bayesian work. Papers should be submitted for presentation at the JSM as topic contributed or invited papers. Those papers not already a part of a session should be submitted online using the following settings:

(at URL:

* Abstract Type: Topic contributed
* Sub Type: Papers
* Sponsor: Section on Bayesian Statistical Science
* Organizer:  Alyson Wilson
* Organizer e-mail: agw -at-

Application Process

The deadline for application is Feb. 1 (same as the JSM 2010 abstract submission deadline). A formal application including the following materials should be emailed to Prof. Vanja Dukic (vanja -at-

a)      CV
b)      Abstract number (from the ASA JSM 2010 abstract submission)
c)      Letter from the major professor (advisor) or faculty co-author, verifying the student status of the candidate, and briefly describing the candidate’s role in the research and writing of the paper
d)      The manuscript, suitable for journal submission, in .pdf format.

Selection of Winners

Papers will be reviewed by a committee determined by the officers of the SBSS. Criteria for selection will include, but are not limited to, significance and potential impact of the research.  Decisions of the committee are final, and will be announced in the Spring before the JSM.


Prizes will consist of a certificate to be presented at the SBSS section meeting and partial support (up to $1000) for attending the JSM.  Please note that the awards may be unable to cover the entirety of any winner’s travel, so winning candidates may need to supplement the SBSS award with other funds. To receive a monetary prize, the winner will need to provide proof of membership and submit travel receipts to the SBSS treasurer after the JSM.

]]> 0
[ArXiv] Cross Validation Wed, 12 Aug 2009 23:03:43 +0000 hlee Statistical Resampling Methods are rather unfamiliar among astronomers. Bootstrapping can be an exception but I felt like it’s still unrepresented. Seeing an recent review paper on cross validation from [arXiv] which describes basic notions in theoretical statistics, I couldn’t resist mentioning it here. Cross validation has been used in various statistical fields such as classification, density estimation, model selection, regression, to name a few.

A survey of cross validation procedures for model selection by Sylvain Arlot

Nonetheless, I’ll not review the paper itself except some quotes:

-CV is a popular strategy for model selection, and algorithm selection.
-Compared to the resubstitution error, CV avoids overfitting because the training sample is independent from the validation sample.
-A noticed in the early 30s by Larson (1931), training an algorithm and evaluating its statistical performance on the same data yields an overoptimistic results.

There are books on statistical resampling methods covering more general topics, not limited to model selection. Instead, I decide to do a little search how CV is used in astronomy. These are the ADS search results. More publications than I expected.

One can easily grasp that many adopted CV under the machine learning context. The application of CV, and bootstrapping is not limited to machine learning. As Arlot’s title, CV is used for model selection. When it come to model selection in high energy astrophysics, not CV but reduced chi^2 measures and fitted curve eye balling are the standard procedure. Hopefully, a renovated model selection procedure via CV or other statistically robust strategy soon challenge the reduced chi^2 and eye balling. On the other hand, I doubt that it’ll come soon. Remember, eyes are the best classifier so it won’t be a easy task.

]]> 0
Robust Statistics Mon, 18 May 2009 17:18:09 +0000 hlee My understandings of “robustness” from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss?

First, since it’s short, let’s quote a comment from an astronomer that might reflect the notion of robust statistics in astronomy.

Bayesian is robust.

Is every Bayesian method robust and its counter part from classical statistics is not robust? I know that popular statistics in astronomy are not, generally speaking, robust and those popular statistics were borne before the notion of robustness in statistics were recognized and discussed.

I do understand why such partisan comment was produced. Astronomers always reports their data analysis results by best fits, error bars, probability, or some significance levels (they don’t say explicitly, p-values, powers, type I or type II errors, unbiased estimates, and other statistical jargon in inference problems) and those classical methods of frequent use have caused frustrations due to their lack of robustness. On the contrary, MCMC algorithms for estimating posterior distributions produce easy interpretable results to report best fit (mode) and error bar (HPD).

My understanding of robustness as a statistician does not draw a line between Bayesian and frequenstists. The following is quoted from the Katholieke Universiteit Leuven website of which mathematics department has a focus group for robust statistics.

Robust statistical methods and applications.
The goal of robust statistics is to develop data analytical methods which are resistant to outlying observations in the data, and hence which are also able to detect these outliers. Pioneering work in this area has been done by Huber (1981), Hampel et al. (1986) and Rousseeuw and Leroy (1987). In their work, estimators for location, scale, scatter and regression play a central role. They assume that the majority of the data follow a parametric model, whereas a minority (the contamination) can take arbitrary values. This approach leads to the concept of the influence function of an estimator which measures the influence of a small amount of contamination in one point. Other measures of robustness are the finite-sample and the asymptotic breakdown value of an estimator. They tell what the smallest amount of contamination is which can carry the estimates beyond all bounds.

Nowadays, robust estimators are being developed for many statistical models. Our research group is very active in investigating estimators of covariance and regression for high-dimensional data, with applications in chemometrics and bio-informatics. Recently, robust estimators have been developed for PCA (principal component analysis), PCR (principal component regression), PLS (partial least squares), classification, ICA (independent component analysis) and multi-way analysis. Also robust measures of skewness and tail weight have been introduced. We study robustness of kernel methods, and regression quantiles for censored data.

My understanding of “robustness” from statistics education is pandemic, covers both Bayesian and frequentist. Any methods and models that are insensitive or immune to outliers, are robust methods and statistics. For example, median is more robust than mean since the breakpoint of median is 1/2 and that of mean is 0, asymptotically. Both mean and median are estimable from Bayesian and frequentist methods. Instead of standard deviation, tactics like lower and upper quartiles to indicate error bars or Winsorization (or trim) to obtain a standard deviation for the error bar, are adopted regardless of Bayesian or frequenstist. Instead of the chi square goodness-of-fit tests, which assume Gaussian residuals, nonparametrics tests or distribution free tests can be utilized.

The notion that frequentist methods are not robust might have been developed from the frustration that those chi-square related methods in astronomy do not show robust characteristics. The reason is that data are prone to the breaks of the normality (Gaussianity) assumption. Also, the limited library of nonparametric methods in data analysis packages and softwares envisions that frequentist methods are not robust. An additional unattractive aspect about frequentist methods is that the description seems too mathematical, too abstract, and too difficult to be codified with full of unfriendly jargon whereas the Bayesian methods provide step by step modeling procedures with explanation why they chose such likelihood and such priors based on external knowledge from physics and observations (MCMC algorithms in the astronomical papers are adaptation of already proven algorithms from statistics and algorithm journals).

John Tukey said:

Robustness refers to the property of a procedure remaining effective even in the absence of usual assumptions such as normality and no incorrect data values. In simplest terms the idea is to improve upon the use of the simple arithmetic average in estimating the center of a distribution. As a simple case one can ask: Is it ever better to use the sample median than the samle mean, and if so, when?

I don’t think the whole set of frequentist methods is the complement set of Bayesians. Personally I feel quite embarrassed whenever I am told that frequentist methods are not robust compared to Bayesian methods. Bayesian methods become robust when a priori knowledge (subjective priors) allows the results to be unaffected by outliers with a general class of likelihood. Regardless of being frequentist or Bayesian, statistics have been developed to be less sensitive to outliers and to do optimal inferences, i.e. to achieve the goal, robustness.

Ah…there are other various accounts for robustness methods/statistics in astronomy not limited to “bayesian is robust.” As often I got confused whenever I see statistically rigorous while the method is a simple chi-square minimization in literature, I believe astronomers can educate me the notion of robustness and the definition of robust statistics in various ways. I hope for some sequels from astronomers about robustness and robust statistics to this post of limited view.

]]> 0
systematic errors Fri, 06 Mar 2009 19:42:18 +0000 hlee Ah ha~ Once I questioned, “what is systematic error?” (see [Q] systematic error.) Thanks to L. Lyons’ work discussed in [ArXiv] Particle Physics, I found this paper, titled Systematic Errors describing the concept and statistical inference related to systematic errors in the field of particle physics. It, gladly, shares lots of similarity with high energy astrophysics.

Systematic Errors by J. Heinrich and L.Lyons
in Annu. Rev. Nucl. Part. Sci. (2007) Vol. 57 pp.145-169 []

The characterization of two error types, systematic and statistical error is illustrated with an simple physics experiment, the pendulum. They described two distinct sources of systematic errors.

…the reliable assessment of systematics requires much more thought and work than for the corresponding statistical error.
Some errors are clearly statistical (e.g. those associated with the reading errors on T and l), and others are clearly systematic (e.g., the correction of the measured g to its sea level value). Others could be regarded as either statistical or systematic (e.g., the uncertainty in the recalibration of the ruler). Our attitude is that the type assigned to a particular error is not crucial. What is important is that possible correlations with other measurements are clearly understood.

Section 2 contains a very nice review in english, not in mathematical symbols, about the basics of Bayesian and frequentist statistics for inference in particle physics with practical accounts. Comparison of Bayes and Frequentist approaches is provided. (I was happy to see that χ2 is said to not belong to frequentist methods. It is just a popular method in references about data analysis in astronomy, not in modern statistics. If someone insists, statisticians could study the χ2 statistic under some assumptions and conditions that suit properties of astronomical data, investigate the efficiency and completeness of grouped Poission counts for Gaussian approximation within the χ2 minimization process, check degrees of information loss, and so forth)

To a Bayesian, probability is interpreted as the degree of belief in a statement. …
In contast, frequentists define probability via a repeated series of almost identical trials;…

Section 3 clarifies the notion of p-values as such:

It is vital to remember that a p-value is not the probability that the relevant hypothesis is true. Thus, statements such as “our data show that the probability that the standard model is true is below 1%” are incorrect interpretations of p-values.

This reminds me of the null hypothesis probability that I often encounter in astronomical literature or discussions to report the X-ray spectral fitting results. I believe astronomers using the null hypothesis probability are confused between Bayesian and frequentist concepts. The computation is based on the frequentist idea, p-value but the interpretation is given via Bayesian. A separate posting on the null hypothesis probability will come shortly.

Section 4 describes both Bayesian and frequentist ways to include systematics. Through its parameterization (for Gaussian, parameterization is achieved with additive error terms, or none zero elements in full covariance matrix), systematic uncertainty is treated as nuisance parameters in the likelihood for both Bayesian and frequentist alike although the term “nuisance” appears in frequentist’s likelihood principles. Obtaining the posterior distribution of a parameter(s) of interest requires marginalization over uninteresting parameters which are seen as nuisance parameters in frequentist methods.

The miscellaneous section (Sec. 6) is the most useful part for understanding the nature and strategies for handling systematic errors. Instead of copying the whole section, here are two interesting quotes:

When the model under which the p-value is calculated has nuisance parameters (i.e. systematic uncertainties) the proper computation of the p-value is more complicated.

The contribution form a possible systematic can be estimated by seeing the change in the answer a when the nuisance parameter is varied by its uncertainty.

As warned, it is not recommended to combine calibrated systematic error and estimated statistical error in quadrature, since we cannot assume those errors are uncorrelated all the time. Except the disputes about setting a prior distribution, Bayesian strategy works better since the posterior distribution is the distribution of the parameter of interest, directly from which one gets the uncertainty in the parameter. Remember, in Bayesian statistics, parameters are random whereas in frequentist statistics, observations are random. The χ2 method only approximates uncertainty as Gaussian (equivalent to the posterior with a gaussian likelihood centered at the best fit and with a flat prior) with respect to the best fit and combines different uncertainties in quadrature. Neither of strategies is superior almost always than the other in a general term of performing statistical inference; however, case-specifically, we can say that one functions better than the other. The issue is how to define a model (distribution, distribution family, or class of functionals) prior to deploying various methodologies and therefore, understanding systematic errors in terms of model, or parametrization, or estimating equation, or robustness became important. Unfortunately, systematic descriptions about systematic errors from the statistical inference perspective are not present in astronomical publications. Strategies of handling systematic errors with statistical care are really hard to come by.

Still I think that their inclusion of systematic errors is limited to parametric methods, in other words, without parametrization of systematic errors, one cannot assess/quantify systematic errors properly. So, what if such parametrization of systematics is not available? I thought that some general semi-parametric methodology possibly assists developing methods of incorporating systematic errors in spectral model fitting. Our group has developed a simple semi-parametric way to incorporate systematic errors in X-ray spectral fitting. If you like to know how it works, please check out my poster in pdf. It may be viewed too conservative as if projection since instead of parameterizing systemtatics, the posterior was empirically marginalized over the systematics, the hypothetical space formed by simulated sample of calibration products.

I believe publications about handling systematic errors will enjoy prosperity in astronomy and statistics as long as complex instruments collect data. Beyond combining in quadrature or Gaussian approximation, systematic errors can be incorporated in a more sophisticated fashion, parametrically or nonparametrically. Particularly for the latter, statisticians knowledge and contributions are in great demand.

]]> 0
[ArXiv] Particle Physics Fri, 20 Feb 2009 23:48:39 +0000 hlee

Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription.

Descriptions of various particles from modern particle physics are briefly given (I like such brevity, conciseness, but delivering necessaries. If you want more on physics, find those famous bestselling books like The first three minutes, A brief history of time, The elegant universe, or Feynman’s and undergraduate textbooks of modern physics and of particle physics). Large Hardron Collider (LHC, hereafter. LHC related slog postings: LHC first beam, The Banff challenge, Quote of the week, Phystat – LHC 2008) is introduced on top of its statistical challenges from the data collecting/processing perspectives since it is expected to collect 1010 events. Visit LHC website to find more about LHC.

My one line summary of the article is solving particle physics problems from the hypothesis testing or rather broadly classical statistical inference approaches. I enjoyed the most reading section 5 and 6, particularly the subsection titled Why 5σ? Here are some excerpts I like to share with you from the article:

It is hoped that the approaches mentioned in this article will be interesting or outrageous enough to provoke some Statisticians either to collaborate with Particle Physicists, or to provide them with suggestions for improving their analyses. It is to be noted that the techniques described are simply those used by Particle Physicists; no claim is made that they are necessarily optimal (Personally, I like such openness and candidness.).

… because we really do consider that our data are representative as samples drawn according to the model we are using (decay time distributions often are exponential; the counts in repeated time intervals do follow a Poisson distribution, etc.), and hence we want to use a statistical approach that allows the data “to speak for themselves,” rather than our analysis being dominated by our assumptions and beliefs, as embodied in Bayesian priors.

Because experimental detectors are so expensive to construct, the time-scale over which they are built and operated is so long, and they have to operate under harsh radiation conditions, great care is devoted to their design and construction. This differs from the traditional statistical approach for the design of agricultural tests of different fertilisers, but instead starts with a list of physics issues which the experiment hopes to address. The idea is to design a detector which will proved answers to the physics questions, subject to the constraints imposed by the cost of the planned detectors, their physical and mechanical limitations, and perhaps also the limited available space. (Personal belief is that what segregates physical science from other science requiring statistical thinking is that uncontrolled circumstances are quite common in physics and astronomy whereas various statistical methodologies are developed under assumptions of controllable circumstances, traceable subjects, and collectible additional sample.)

…that nothing was found, it is more useful to quote an upper limit on the sought-for effect, as this could be useful in ruling out some theories.

… the nuisance parameters arise from the uncertainties in the background rate b and the acceptance ε. These uncertainties are usually quoted as σb and σε, and the question arises of what these errors mean. … they would express the width of the Bayesian posterior or of the frequentist interval obtained for the nuisance parameter. … they may involve Monte Carlo simulations, which have systematic uncertainties as well as statistical errors …

Particle physicists usually convert p into the number of standard deviation σ of a Gaussian distribution, beyond which the one-sided tail area corresponds to p. Thus, 5σ corresponds to a p-value of 3e-7. This is done simple because it provides a number which is easier to remember, and not because Guassians are relevant for every situation.
Unfortunately, p-values are often misinterpreted as the probability of the theory being true, given the data. It sometimes helps colleagues clarify the difference between p(A|B) and p(B|A) by reminding them that the probability of being pregnant, given the fact that you are female, is considerable smaller than the probability of being female, given the fact that you are pregnant.

… the situation is much less clear for nuisance parameters, where error estimates may be less rigorous, and their distribution is often assumed to be Gaussian (or truncated Gaussain) by default. The effect of these uncertainties on very small p-values needs to be investigated case-by-case.
We also have to remember that p-values merely test the null hypothesis. A more sensitive way to look for new physics is via the likelihood ratio or the differences in χ2 for the two hypotheses, that is, with and without the new effect. Thus, a very small p-value on its own is usually not enough to make a convincing case for discovery.

If we are in the asymptotic regime, and if the hypotheses are nested, and if the extra parameters of the larger hypothesis are defined under the samller one, and in that case do not lie on the boundary of their allowed region, then the difference in χ2 should itself be distributed as a χ2, with the number of degrees of freedom equal to the number of extra parameters (I’ve seen many papers in astronomy not minding (ignoring) these warnings for the likelihood ratio tests)

The standard method loved by Particle Physicists (astronomers alike) is χ2. This, however, is only applicable to binned data (i.e., in a one or more dimensional histogram). Furthermore, it loses its attractive feature that its distribution is model independent when there are not enough data, which is likely to be so in the multi-dimensional case. (High energy astrophysicists deal low count data on multi-dimensional parameter space; the total number of bins are larger than the number of parameters but to me, binning/grouping seems to be done aggressively to meet the good S/N so that the detail information about the parameters from the data gets lost. ).

…, the σi are supposed to be the true accuracies of the measurements. Often, all that we have available are estimates of their values (I also noticed astronomers confuse between true σ and estimated σ). Problems arise in situations where the error estimate depends on the measured value a (parameter of interest). For example, in counting experiments with Poisson statistics, it is typical to set the error as the square root of the observd number. Then a downward fluctuation in the observation results in an overestimated weight, and abest-fit is biased downward. If instead the error is estimated as the square root of the expected number a, the combined result is biased upward – the increased error reduces S at large a. (I think astronomers are aware of this problem but haven’t taken actions yet to rectify the issue. Unfortunately not all astronomers take the problem seriously and some blindly apply 3*sqrt(N) as a threshold for the 99.7 % (two sided) or 99.9% (one sided) coverage.)

Background estimation, particularly when observed n is less tan the expected background b is discussed in the context of upper limits derived from both statistical streams – Bayesian and frequentist. The statistical focus from particle physicists’ concern is classical statistical inference problems like hypothesis testing or estimating confidence intervals (it is not necessary that these intervals are closed) under extreme physical circumstances. The author discusses various approaches with modern touches of both statistical disciplines to tackle how to obtain upper limits with statistically meaningful and allocatable quantification.

As described, many physicists endeavor on a grand challenge of finding a new particle but this challenge is put concisely from the statistically perspectives like p-values, upper limits, null hypothesis, test statistics, confidence intervals with peculiar nuisance parameters or rather lack of straightforwardness priors, which lead to lengthy discussions among scientists and produce various research papers. In contrast, the challenges that astronomers have are not just finding the existence of new particles but going beyond or juxtaposing. Astronomers like to parameterize them by selecting suitable source models, from which collected photons are the results of modification caused by their journey and obstacles in their path. Such parameterization allows them to explain the driving sources of photon emission/absorption. It enables to predict other important features, temperature to luminosity, magnitudes to metalicity, and many rules of conversions.

Due to different objectives, one is finding a hay look alike needle in a haystack and the other is defining photon generating mechanisms (it may lead to find a new kind celestial object), this article may not interest astronomers. Yet, having the common ground, physics and statistics, it is a dash of enlightenment of knowing various statistical methods applied to physical data analysis for achieving a goal, refining physics. I recall my posts on coverages and references therein might be helpful:interval estimation in exponential families and [arxiv] classical confidence interval.

I felt that from papers some astronomers do not aware of problems with χ2 minimization nor the underline assumptions about the method. This paper convey some dangers about the χ2 with the real examples from physics, more convincing for astronomers than statisticians’ hypothetical examples via controlled Monte Carlo simulations.

And there are more reasons to check this paper out!

]]> 0