The AstroStat Slog » coverage

[MADS] plug-in estimator

hlee — Tue, 21 Apr 2009 02:34:40 +0000

I asked a couple of astronomers if they heard the term plug-in estimator and none of them gave me a positive answer.

When computing sample mean (xbar) and sample variance (s^2) to obtain (xbar-s, xbar+s) and to claim this interval covers 68%, these sample mean, sample variance, and the interval are plug-in estimators. Whilst clarifying the form of sampling distribution, or on verifying the formulas or estimators of sample mean and sample variance truly match with true mean and true variance, I can drop plug-in part because I know asymptotically such interval (estimator) will cover 68%.

When there is lack of sample size or it is short in sufficing (theoretic) assumptions, instead of saying 1-σ, one would want to say s, or plug-in error estimator. Without knowing the true distribution (asymptotically, the empirical distribution), somehow 1-σ mislead that best fit and error bar assures 68% coverage, which is not necessary true. What is computed/estimated is s or a plug-in estimator that is defined via Δ chi-square=1. Generally, the Greek letter σ in statistics indicate parameter, not a function of data (estimator), for instance, sample standard deviation (s), root mean square error (rmse), or the solution of Δ chi-square=1.

Often times I see extended uses of statistics and their related terms in astronomical literature which lead unnecessary comments and creative interpretation to account for unpleasant numbers. Because of the plug-in nature, the interval may not cover the expected value from physics. It’s due to chi-square minimization (best fit can be biased) and data quality (there are chances that data contain outliers or go under modification through instruments and processing). Unless robust statistics is employed (outliers could shift best fits and robust statistics are less sensitive to outliers) and calibration uncertainty or some other correction tools are suitably implemented, strange intervals are not necessary to be followed by creative comments or to be discarded. Those intervals are by products of employing plug-in estimators whose statistical properties are unknown during the astronomers’ data analysis state. Instead of imaginative interpretations, one would proceed with investigating those plug-in estimators and try to device/rectify them in order to make sure they lead close to the truth.

For example, instead of simple average (xbar=f(x_1,…,x_n) :average is a function of data whereas the chi-square minimzation method is another function of data), whose breakdown point is asymptotically zero and can be off from the truth, median (another function of data) could serve better (breakdown point is 1/2). We know that the chi-square methods are based on L2 norm (e.g. variation of least square methods). Instead, one can develop methods based on L1 norm as in quantile regression or least absolute deviation (LAD, in short, link from wiki). There are so many statistics are available to walk around short comings of popular plug-in estimators when sampling distribution is not (perfect) gaussian or analytic solution does not exist.

systematic errors

hlee — Fri, 06 Mar 2009 19:42:18 +0000

Ah ha~ Once I questioned, “what is systematic error?” (see [Q] systematic error.) Thanks to L. Lyons’ work discussed in [ArXiv] Particle Physics, I found this paper, titled Systematic Errors describing the concept and statistical inference related to systematic errors in the field of particle physics. It, gladly, shares lots of similarity with high energy astrophysics.

Systematic Errors by J. Heinrich and L.Lyons
in Annu. Rev. Nucl. Part. Sci. (2007) Vol. 57 pp.145-169 [http://adsabs.harvard.edu/abs/2007ARNPS..57..145H]

The characterization of two error types, systematic and statistical error is illustrated with an simple physics experiment, the pendulum. They described two distinct sources of systematic errors.

…the reliable assessment of systematics requires much more thought and work than for the corresponding statistical error.
Some errors are clearly statistical (e.g. those associated with the reading errors on T and l), and others are clearly systematic (e.g., the correction of the measured g to its sea level value). Others could be regarded as either statistical or systematic (e.g., the uncertainty in the recalibration of the ruler). Our attitude is that the type assigned to a particular error is not crucial. What is important is that possible correlations with other measurements are clearly understood.

Section 2 contains a very nice review in english, not in mathematical symbols, about the basics of Bayesian and frequentist statistics for inference in particle physics with practical accounts. Comparison of Bayes and Frequentist approaches is provided. (I was happy to see that χ² is said to not belong to frequentist methods. It is just a popular method in references about data analysis in astronomy, not in modern statistics. If someone insists, statisticians could study the χ² statistic under some assumptions and conditions that suit properties of astronomical data, investigate the efficiency and completeness of grouped Poission counts for Gaussian approximation within the χ² minimization process, check degrees of information loss, and so forth)

To a Bayesian, probability is interpreted as the degree of belief in a statement. …
In contast, frequentists define probability via a repeated series of almost identical trials;…

Section 3 clarifies the notion of p-values as such:

It is vital to remember that a p-value is not the probability that the relevant hypothesis is true. Thus, statements such as “our data show that the probability that the standard model is true is below 1%” are incorrect interpretations of p-values.

This reminds me of the null hypothesis probability that I often encounter in astronomical literature or discussions to report the X-ray spectral fitting results. I believe astronomers using the null hypothesis probability are confused between Bayesian and frequentist concepts. The computation is based on the frequentist idea, p-value but the interpretation is given via Bayesian. A separate posting on the null hypothesis probability will come shortly.

Section 4 describes both Bayesian and frequentist ways to include systematics. Through its parameterization (for Gaussian, parameterization is achieved with additive error terms, or none zero elements in full covariance matrix), systematic uncertainty is treated as nuisance parameters in the likelihood for both Bayesian and frequentist alike although the term “nuisance” appears in frequentist’s likelihood principles. Obtaining the posterior distribution of a parameter(s) of interest requires marginalization over uninteresting parameters which are seen as nuisance parameters in frequentist methods.

The miscellaneous section (Sec. 6) is the most useful part for understanding the nature and strategies for handling systematic errors. Instead of copying the whole section, here are two interesting quotes:

When the model under which the p-value is calculated has nuisance parameters (i.e. systematic uncertainties) the proper computation of the p-value is more complicated.

The contribution form a possible systematic can be estimated by seeing the change in the answer a when the nuisance parameter is varied by its uncertainty.

As warned, it is not recommended to combine calibrated systematic error and estimated statistical error in quadrature, since we cannot assume those errors are uncorrelated all the time. Except the disputes about setting a prior distribution, Bayesian strategy works better since the posterior distribution is the distribution of the parameter of interest, directly from which one gets the uncertainty in the parameter. Remember, in Bayesian statistics, parameters are random whereas in frequentist statistics, observations are random. The χ² method only approximates uncertainty as Gaussian (equivalent to the posterior with a gaussian likelihood centered at the best fit and with a flat prior) with respect to the best fit and combines different uncertainties in quadrature. Neither of strategies is superior almost always than the other in a general term of performing statistical inference; however, case-specifically, we can say that one functions better than the other. The issue is how to define a model (distribution, distribution family, or class of functionals) prior to deploying various methodologies and therefore, understanding systematic errors in terms of model, or parametrization, or estimating equation, or robustness became important. Unfortunately, systematic descriptions about systematic errors from the statistical inference perspective are not present in astronomical publications. Strategies of handling systematic errors with statistical care are really hard to come by.

Still I think that their inclusion of systematic errors is limited to parametric methods, in other words, without parametrization of systematic errors, one cannot assess/quantify systematic errors properly. So, what if such parametrization of systematics is not available? I thought that some general semi-parametric methodology possibly assists developing methods of incorporating systematic errors in spectral model fitting. Our group has developed a simple semi-parametric way to incorporate systematic errors in X-ray spectral fitting. If you like to know how it works, please check out my poster in pdf. It may be viewed too conservative as if projection since instead of parameterizing systemtatics, the posterior was empirically marginalized over the systematics, the hypothetical space formed by simulated sample of calibration products.

I believe publications about handling systematic errors will enjoy prosperity in astronomy and statistics as long as complex instruments collect data. Beyond combining in quadrature or Gaussian approximation, systematic errors can be incorporated in a more sophisticated fashion, parametrically or nonparametrically. Particularly for the latter, statisticians knowledge and contributions are in great demand.

[ArXiv] Particle Physics

hlee — Fri, 20 Feb 2009 23:48:39 +0000

[stat.AP:0811.1663]
Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription.

Descriptions of various particles from modern particle physics are briefly given (I like such brevity, conciseness, but delivering necessaries. If you want more on physics, find those famous bestselling books like The first three minutes, A brief history of time, The elegant universe, or Feynman’s and undergraduate textbooks of modern physics and of particle physics). Large Hardron Collider (LHC, hereafter. LHC related slog postings: LHC first beam, The Banff challenge, Quote of the week, Phystat – LHC 2008) is introduced on top of its statistical challenges from the data collecting/processing perspectives since it is expected to collect 10¹⁰ events. Visit LHC website to find more about LHC.

My one line summary of the article is solving particle physics problems from the hypothesis testing or rather broadly classical statistical inference approaches. I enjoyed the most reading section 5 and 6, particularly the subsection titled Why 5σ? Here are some excerpts I like to share with you from the article:

It is hoped that the approaches mentioned in this article will be interesting or outrageous enough to provoke some Statisticians either to collaborate with Particle Physicists, or to provide them with suggestions for improving their analyses. It is to be noted that the techniques described are simply those used by Particle Physicists; no claim is made that they are necessarily optimal (Personally, I like such openness and candidness.).

… because we really do consider that our data are representative as samples drawn according to the model we are using (decay time distributions often are exponential; the counts in repeated time intervals do follow a Poisson distribution, etc.), and hence we want to use a statistical approach that allows the data “to speak for themselves,” rather than our analysis being dominated by our assumptions and beliefs, as embodied in Bayesian priors.

Because experimental detectors are so expensive to construct, the time-scale over which they are built and operated is so long, and they have to operate under harsh radiation conditions, great care is devoted to their design and construction. This differs from the traditional statistical approach for the design of agricultural tests of different fertilisers, but instead starts with a list of physics issues which the experiment hopes to address. The idea is to design a detector which will proved answers to the physics questions, subject to the constraints imposed by the cost of the planned detectors, their physical and mechanical limitations, and perhaps also the limited available space. (Personal belief is that what segregates physical science from other science requiring statistical thinking is that uncontrolled circumstances are quite common in physics and astronomy whereas various statistical methodologies are developed under assumptions of controllable circumstances, traceable subjects, and collectible additional sample.)

…that nothing was found, it is more useful to quote an upper limit on the sought-for effect, as this could be useful in ruling out some theories.

… the nuisance parameters arise from the uncertainties in the background rate b and the acceptance ε. These uncertainties are usually quoted as σ_b and σ_ε, and the question arises of what these errors mean. … they would express the width of the Bayesian posterior or of the frequentist interval obtained for the nuisance parameter. … they may involve Monte Carlo simulations, which have systematic uncertainties as well as statistical errors …

Particle physicists usually convert p into the number of standard deviation σ of a Gaussian distribution, beyond which the one-sided tail area corresponds to p. Thus, 5σ corresponds to a p-value of 3e-7. This is done simple because it provides a number which is easier to remember, and not because Guassians are relevant for every situation.
Unfortunately, p-values are often misinterpreted as the probability of the theory being true, given the data. It sometimes helps colleagues clarify the difference between p(A|B) and p(B|A) by reminding them that the probability of being pregnant, given the fact that you are female, is considerable smaller than the probability of being female, given the fact that you are pregnant.

… the situation is much less clear for nuisance parameters, where error estimates may be less rigorous, and their distribution is often assumed to be Gaussian (or truncated Gaussain) by default. The effect of these uncertainties on very small p-values needs to be investigated case-by-case.
We also have to remember that p-values merely test the null hypothesis. A more sensitive way to look for new physics is via the likelihood ratio or the differences in χ² for the two hypotheses, that is, with and without the new effect. Thus, a very small p-value on its own is usually not enough to make a convincing case for discovery.

If we are in the asymptotic regime, and if the hypotheses are nested, and if the extra parameters of the larger hypothesis are defined under the samller one, and in that case do not lie on the boundary of their allowed region, then the difference in χ² should itself be distributed as a χ², with the number of degrees of freedom equal to the number of extra parameters (I’ve seen many papers in astronomy not minding (ignoring) these warnings for the likelihood ratio tests)

The standard method loved by Particle Physicists (astronomers alike) is χ². This, however, is only applicable to binned data (i.e., in a one or more dimensional histogram). Furthermore, it loses its attractive feature that its distribution is model independent when there are not enough data, which is likely to be so in the multi-dimensional case. (High energy astrophysicists deal low count data on multi-dimensional parameter space; the total number of bins are larger than the number of parameters but to me, binning/grouping seems to be done aggressively to meet the good S/N so that the detail information about the parameters from the data gets lost. ).

…, the σ_i are supposed to be the true accuracies of the measurements. Often, all that we have available are estimates of their values (I also noticed astronomers confuse between true σ and estimated σ). Problems arise in situations where the error estimate depends on the measured value a (parameter of interest). For example, in counting experiments with Poisson statistics, it is typical to set the error as the square root of the observd number. Then a downward fluctuation in the observation results in an overestimated weight, and a_best-fit is biased downward. If instead the error is estimated as the square root of the expected number a, the combined result is biased upward – the increased error reduces S at large a. (I think astronomers are aware of this problem but haven’t taken actions yet to rectify the issue. Unfortunately not all astronomers take the problem seriously and some blindly apply 3*sqrt(N) as a threshold for the 99.7 % (two sided) or 99.9% (one sided) coverage.)

Background estimation, particularly when observed n is less tan the expected background b is discussed in the context of upper limits derived from both statistical streams – Bayesian and frequentist. The statistical focus from particle physicists’ concern is classical statistical inference problems like hypothesis testing or estimating confidence intervals (it is not necessary that these intervals are closed) under extreme physical circumstances. The author discusses various approaches with modern touches of both statistical disciplines to tackle how to obtain upper limits with statistically meaningful and allocatable quantification.

As described, many physicists endeavor on a grand challenge of finding a new particle but this challenge is put concisely from the statistically perspectives like p-values, upper limits, null hypothesis, test statistics, confidence intervals with peculiar nuisance parameters or rather lack of straightforwardness priors, which lead to lengthy discussions among scientists and produce various research papers. In contrast, the challenges that astronomers have are not just finding the existence of new particles but going beyond or juxtaposing. Astronomers like to parameterize them by selecting suitable source models, from which collected photons are the results of modification caused by their journey and obstacles in their path. Such parameterization allows them to explain the driving sources of photon emission/absorption. It enables to predict other important features, temperature to luminosity, magnitudes to metalicity, and many rules of conversions.

Due to different objectives, one is finding a hay look alike needle in a haystack and the other is defining photon generating mechanisms (it may lead to find a new kind celestial object), this article may not interest astronomers. Yet, having the common ground, physics and statistics, it is a dash of enlightenment of knowing various statistical methods applied to physical data analysis for achieving a goal, refining physics. I recall my posts on coverages and references therein might be helpful:interval estimation in exponential families and [arxiv] classical confidence interval.

I felt that from papers some astronomers do not aware of problems with χ² minimization nor the underline assumptions about the method. This paper convey some dangers about the χ² with the real examples from physics, more convincing for astronomers than statisticians’ hypothetical examples via controlled Monte Carlo simulations.

And there are more reasons to check this paper out!

[ArXiv] Post Model Selection, Nov. 7, 2007

hlee — Wed, 07 Nov 2007 15:57:01 +0000

Today’s arxiv-stat email included papers by Poetscher and Leeb, who have been working on post model selection inference. Sometimes model selection is misled as a part of statistical inference. Simply, model selection can be considered as a step prior to inference. How you know your data are from chi-square distribution, or gamma distribution? (this is a model selection problem with nested models.) Should I estimate the degree of freedom, k from Chi-sq or α and β from gamma to know mean and error? Will the errors of the mean be same from both distributions?

Prior to estimating means and errors of parameters, one wishes to choose a model where parameters of interests are properly embedded. The arising problem is one uses the same data to choose a model (e.g. choosing the model with the largest likelihood value or bayes factor) as well as to perform statistical inference (estimating parameters, calculating confidence intervals and testing hypotheses), which inevitably introduces bias. Such bias has been neglected in general (a priori tells what model to choose: e.g. the 2nd order polynomial is the absolute truth and the residuals are realizations of the error term, by the way how one can sure that the error follows normal distribution?). Asymptotics enables this bias to be O(n^m), where m is smaller than zero. Estimating this bias has been popular since Akaike introduced AIC (one of the most well known model selection criteria). Numerous works are found in the field of robust penalized likelihood. Variable selection has been a very hot topic in a recent few decades. Beyond my knowledge, there were more approaches to cope with this bias not to contaminate the inference results.

The works by Professors Poetscher and Leeb looked unique to me in the line of resolving the intrinsic bias arise from inference after model selection. In stead of being listed in my weekly arxiv lists, their arxiv papers deserved to be listed under a separate posting. I also included some more general references.

The list of paper from today’s arxiv:

[stat.TH:0702703] Can one estimate the conditional distribution of post-model-selection estimators? by H. Leeb and B. M. P\”{o}tscher
[stat.TH:0702781] The distribution of model averaging estimators and an impossibility result regarding its estimation by B. M. P\”{o}tscher
[stat.TH:0704.1466] Sparse Estimators and the Oracle Property, or the Return of Hodges’ Estimator by H. Leeb and B. M. Poetscher
[stat.TH:0711.0660] On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding by B. M. Poetscher, and H. Leeb
[stat.TH:0701781] Learning Trigonometric Polynomials from Random Samples and Exponential Inequalities for Eigenvalues of Random Matrices by K. Groechenig, B.M. Poetscher, and H. Rauhut

Other resources:

Prof. Leeb’s website has other published papers
Effects of Model Selection on Inference B.M.Potscher, Econometric Theory, Vol. 7, No. 2 (Jun., 1991), pp. 163-185
The Effect of Model Selection on Confidence Regions and Prediction Regions P.Kabaila, Econometric Theory, Vol. 11, No. 3 (Aug., 1995), pp. 537-549
Model Selection and Multi-Model Inference: a book by Burnham and Anderson
modelselection.org: it’s a model selection website but looks like pageant show website.

[Added on Nov.8th] There were a few more relevant papers from arxiv.

[stat.AP:0711.0993] Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection by P. Kabaila and K. Giri
[stat.ME:0710.1036] Confidence Sets Based on Sparse Estimators Are Necessarily Large by B. M. Pötscher

Coverage issues in exponential families

hlee — Thu, 16 Aug 2007 20:36:51 +0000

I’ve been heard so much, without knowing fundamental reasons (most likely physics), about coverage problems from astrostat/phystat groups. This paper might be an interest for those: Interval Estimation in Exponential Families by Brown, Cai,and DasGupta ; Statistica Sinica (2003), 13, pp. 19-49

Abstract summary:
The authors investigated issues in interval estimation of the mean in the exponential family, such as binomial, Poisson, negative binomial, normal, gamma, and a sixth distribution. The poor performance of the Wald interval has been known not only for discrete cases but for nonnormal continuous cases with significant negative bias. Their computation suggested that the equal tailed Jeffreys interval and the likelihood ratio interval are the best alternatives to the Wald interval.

Brief summary of the paper without equations:
The objective of this paper is interval estimation of the mean in the natural exponential family (NEF) with quadratic variance functions (QVF) and the particular focus has given to discrete NEF-QVF families consisting of the binomial, negative binomial, and the Poission distributions. It is well known that the Wald interval for a binomial proportion suffers from a systematic negative bias and oscillation in its coverage probability even for large n and p near 0.5, which seems to arise from the lattice nature and the skewness of the binomial distribution. They exemplified this systematic bias and oscillation with Poisson cases to illustrate the poor and erratic behavior of the Wald interval in lattice problems. They proved the bias expressions of the three discrete NEF-QVF distributions and added a disconcerting graphical illustration of this negative bias.

Interested readers should check the figure 4, where the performances of the Wald, score, likelihood ratio (LR), and Jeffreys intervals were compared. Also, the figure 5 illustrated the limits of those four intervals: LR and Jeffreys’ intervals were indistinguishable. They derived the coverage probabilities of four intervals via Edgeworth expansions. The nonoscillating O(n^-1) terms from the Edgeworth expansions were studied to compare the coverage properties of these four intervals. The figure 6 shows that the Wald interval has serious negative bias, whereas the nonoscillating term in the score interval is positive for all three, binomial, negative binomial, and Poission distributions. The negative bias of the Wald interval is also found from continuous distributions like normal, gamma, and NEF-GHS distributions (Figure 7).

As a conclusion, they reconfirmed their findings like LR and Jeffreys intervals are the best alternative to the Wald interval in terms of the negative bias in the coverage and the length. The Rao score interval has a merit of easy presentations but its performance is inferior to LR and Jeffreys’ intervals although it is better than the Wald interval. Yet, the authors left a room for users that choosing one of these intervals is a personal choice.

[Addendum] I wonder if statistical properties of Gehrels’ confidence limits have been studied after the publication. I’ll try to post findings about the statistics of the Gehrels’ confidence limits, shortly(hopefully).

[ArXiv] Classical confidence intervals, June 25, 2007

hlee — Wed, 27 Jun 2007 18:23:02 +0000

From arXiv:physics.data-an/0706.3622v1:
Comments on the unified approach to the construction of classical confidence intervals

This paper comments on classical confidence intervals and upper limits, as the so-called a flip-flopping problem, both of which are related asymptotically (when n is large enough) by the definition but cannot be converted from one to the another by preserving the same coverage due to the poisson nature of the data.

I’ve heard a few discussions about classical confidence intervals and upper limits from particle physicists and theoretical statisticians. Nonetheless, not being in the business from the beginning (1. the time point of particle physicists aware of statistics to obtain coverages and upper limits or 2. Neyman’s publication (1937) Phil. Trans. Royal Soc. London A, 236, p.333) makes it hard to grasp the essence of this flip-flopping problem. On the other hand, I could sense that lots of statistical challenges (for both classical and bayesian statisticians) residing in this flip-flopping problem and wish for some tutorials or chronological reviews on the subject.