The AstroStat Slog » mse

[MADS] Law of Total Variance

hlee — Fri, 29 May 2009 04:54:20 +0000

This simple law, despite my trial of full text search, was not showing in ADS. As discussed in systematic errors, astronomers, like physicists, show their error components in two additive terms; statistical error + systematic error. To explain such decomposition and to make error analysis statistically rigorous, the law of total variance (LTV) seems indispensable.

V[X] = V[E[X|Y]] + E[V[X|Y]]

(X and Y are random variables, and X indicates observed data. In addition, V and E stands for variance and expectation, respectively. Instead of X, f(X_1,…,X_n) can be plugged in to represent a best fit. In other words, a best fit is a solution of the chi-square minimization which is a function of data). For Bayesian, the uncertainty of theta, parameter of interest, is

V[theta]=V[E[theta|Y]] + E[V[theta|Y]]

Suppose Y is related to systematics. E[theta|Y] is a function of Y so that V[E[theta|Y] indicates systematic error. V[theta|Y] is statistical error given Y which reflects the fact that unless the parameters of interest and systematics are independent, statistical error cannot be quantified into a single factor next to a best fit. If parameter of interest, theta is independent of Y and Y is fixed, then the uncertainty in theta is solely come from statistical uncertainty (Let’s not consider “model uncertainty” for the time being).

In conjunction of astronomers’ systematic error and statistical error decomposition or representing uncertainties in quadrature (error²_total = error²_stat+error²_sys), statisticians use mean square error (MSE) as total error, in which variance matches statistical error, and bias^2 does systematic error.

MSE = Variance+ bias^2

Now it comes to a question, is systematic error bias? Those methods based on quadratures or parameterization of systematics for marginalization consider systematic error as bias although no account explicitly said so. According to the law of total variance unless it’s orthogonal/independent, quadrature is not proper way to handle systematic uncertainties prevailing in all instruments. Generally parameters (data) and systematics are nonlinearly correlated and hard to factorize (instrument specific empirical studies exist to offer correction factors due to systematics; however, such factors work only on specific cases and the process of defining correction factors is hard to be generalized). Because of the varying nature of systematics over the parameter space, instead of MSE

or mean integrated square error might be of use. The estimator of f(x), or \hat f(x) is either parametrically or nonparametrically estimable while incorporating systematics and correlation structures with statistical errors as a function of a certain domain x. MISE can be viewed as a robust version of chi-square methods but details have not been explored to account for the following identity.

This equation may or may not look simple. Perhaps, the expansion of the above identify could explain more on the error decomposition.

integreated squared bias + the integrated variance (overall systematic error + overall statistical error)

Furthermore, it robustly characterizes uncertainties from systematics, i.e. calibration uncertainties in data analysis. Note that estimating f(x) or \hat f(x) reflects complex structures in uncertainty analysis; whereas, the chi square minimization estimates f(x) via piecewise straight horizontal lines, assumes the homogeneous error in each piece (bin), forces statistical and systematic errors to be orthogonal, and as a consequence, inflates the size of error or produces biased best fits.

Either LTV, MSE, or MISE, even if we do not know the true model f(x) — if unknown, assessing statistical analysis results such as confidence levels/intervals may not be feasible; the reason that chi-square methods offer best fits and its N sigma error bars is that it assume the true model is Gaussian, or N(f(x),\sigma^2), or E(Y|X)=f(X)+\epsilon, V(\epsilon)=\sigma^2 where f(x) is a source model. On the other hand, Monte Carlo simulations, resampling methods like bootstrap, or posterior predictive probability (ppp) allows to infer the truth from which one can evaluate the p-value to indicate one’s confidence on the result from fitting analysis in a nonparametric fashion. — setting up proper models for \hat f(x) or \theta|Y would help assessing the total error in a more realistic manner than the chi-square minimization, additive errors, gaussian quadrature, or subjective expertise on systematics. The underlying notions and related theoretical statistics methodologies of LTV, MSE, or MISE could clarify the questions like how to quantify systematic errors and how systematic uncertainties are related to statistical uncertainties. Well, nothing will make me and astronomers happier if those errors are independent and additive. Even more exuberant, systematic uncertainty can be factorized.

a century ago

hlee — Thu, 07 May 2009 19:22:37 +0000

Almost 100 years ago, A.S. Eddington stated in his book Stellar Movements (1914) that

…in calculating the mean error of a series of observations it is preferable to use the simple mean residual irrespective of sign rather than the mean square residual

Such eminent astronomer said already least absolute deviation over chi-square, if I match simple mean residual and mean square residual to relevant methodologies, in order.

I guess there is a reason everything is done based on the chi-square although a small fraction of the astronomy community is aware of that the chi-square minimization is not the only utility function for finding best fits. The assumption that the residuals “(Observed – Expected)/sigma”, as written in chi-square methods, are (asymptotically) normal – Gaussian, is freely taken into account by astronomical data (astronomers who analyze these data) mainly because of their high volume. The worst case is that even if checking procedures for the Gaussianity are available from statistical literature, applying those procedures to astronomical data is either difficult or ignored. Anyway, if one is sure that the data/parameters of interest are sampled from normal distribution, Eddington’s statement is better to be reverted because of sufficiency. We also know the asymptotic efficiency of sample standard deviation when the probability density function satisfies more general regularity conditions than the Gaussian density.

As a statistician, it is easy to say, “assume that data are iid standard normal, wlog.” And then, develop a statistic, investigate its characteristics, and compare it with other statistics. If this statistics does not show promising results from the comparison and strictly suffers from this normality assumption, then statisticians will attempt to make this statistic robust by checking and relaxing assumptions and math. On the other hand, I’m not sure how much astronomers feel easy with this Gaussianity assumption in their data most of which are married to statistics or statistical tools based on the normal assumption. How often have the efforts of devising the statistic and trying different statistics been taken place?

Without imposing the Gaussianity assumption, I think that Eddington’s comment is extremely insightful. Commonly cited statistical methods in astronomy, like chi square methods, are built on Gaussianity assumption from which sample standard deviation is used for σ, the scale parameter of the normal distribution that is mapped to 68% coverage and multiple of the sample standard deviation correspond to well known percentiles as given in Numerical Recipes. In the end, I think statistical analysis in astronomy literature suffers from a dilemma, “which came first, the chicken or the egg?” On the other hand, I feel setback because such a insightful comment from one of the most renown astrophysicists didn’t gain much weight after many decades. My understanding that Eddington’s suggestion was ignored is acquired from reading only a fraction of publications in major astronomical journals; therefore, I might be wrong. Probably, astronomers use LAD and do robust inferences more often that I think.

Unless not sure about the Gaussianity in data (covering the sampling distribution, residuals between observed and expected, and some transformations), for inference problems, sample standard deviation may not be appropriate to get error bars with matching coverage. Estimating posterior distributions is a well received approach among some astronomers and there are good tutorials and textbooks about Bayesian data analysis for astronomers. Those familiar with basics of statistics, pyMC and its tutorial (or another link from python.org) will be very useful for proper statistical inference. If Bayesian computation sounds too cumbersome, for the simplicity, follow Eddington’s advice. Instead of sample standard deviation, use absolute mean deviation (simple mean residual, Eddington’s words) to quantify uncertainty. Perhaps, one wants to compare best fits and error bars from both strategies.

——————————————————
This story was inspired by Studies in the Hisotry of Probability and Statistics. XXXII: Laplace, Fisher, and the discovery of the concept of sufficiency by Stigler (1973) Biometrika v. 60(3), p.439. The quote of Eddington was adapted from this article. Another quote from this article I like to share:

Fisher went on to observe that this property of σ₂^[1] is quite dependent on the assumption that the population is normal, and showed that indeed σ₁^[2] is preferable to σ₂, at least in large samples, for estimating the scale parameter of the double exponential distribution, providing both estimators are appropriately rescaled

By assuming that each observations is normally (Gaussian) distributed with mean (mu) and variance (sigma^2), and that the object was to estimate sigma, Fisher proved that the sample standard deviation (or mean square residual) is more efficient than the mean deviation form the sample mean (or simple mean residual). Laplace proved it as well. The catch is that assumptions come first, not the sample standard deviation for estimating error (or sigma) of unknown distribution.

sample standard deviation
mean deviation from the sample mean