[MADS] Law of Total Variance

hlee — Fri, 29 May 2009 04:54:20 +0000

This simple law, despite my trial of full text search, was not showing in ADS. As discussed in systematic errors, astronomers, like physicists, show their error components in two additive terms; statistical error + systematic error. To explain such decomposition and to make error analysis statistically rigorous, the law of total variance (LTV) seems indispensable.

V[X] = V[E[X|Y]] + E[V[X|Y]]

(X and Y are random variables, and X indicates observed data. In addition, V and E stands for variance and expectation, respectively. Instead of X, f(X_1,…,X_n) can be plugged in to represent a best fit. In other words, a best fit is a solution of the chi-square minimization which is a function of data). For Bayesian, the uncertainty of theta, parameter of interest, is

V[theta]=V[E[theta|Y]] + E[V[theta|Y]]

Suppose Y is related to systematics. E[theta|Y] is a function of Y so that V[E[theta|Y] indicates systematic error. V[theta|Y] is statistical error given Y which reflects the fact that unless the parameters of interest and systematics are independent, statistical error cannot be quantified into a single factor next to a best fit. If parameter of interest, theta is independent of Y and Y is fixed, then the uncertainty in theta is solely come from statistical uncertainty (Let’s not consider “model uncertainty” for the time being).

In conjunction of astronomers’ systematic error and statistical error decomposition or representing uncertainties in quadrature (error²_total = error²_stat+error²_sys), statisticians use mean square error (MSE) as total error, in which variance matches statistical error, and bias^2 does systematic error.

MSE = Variance+ bias^2

Now it comes to a question, is systematic error bias? Those methods based on quadratures or parameterization of systematics for marginalization consider systematic error as bias although no account explicitly said so. According to the law of total variance unless it’s orthogonal/independent, quadrature is not proper way to handle systematic uncertainties prevailing in all instruments. Generally parameters (data) and systematics are nonlinearly correlated and hard to factorize (instrument specific empirical studies exist to offer correction factors due to systematics; however, such factors work only on specific cases and the process of defining correction factors is hard to be generalized). Because of the varying nature of systematics over the parameter space, instead of MSE

or mean integrated square error might be of use. The estimator of f(x), or \hat f(x) is either parametrically or nonparametrically estimable while incorporating systematics and correlation structures with statistical errors as a function of a certain domain x. MISE can be viewed as a robust version of chi-square methods but details have not been explored to account for the following identity.

This equation may or may not look simple. Perhaps, the expansion of the above identify could explain more on the error decomposition.

integreated squared bias + the integrated variance (overall systematic error + overall statistical error)

Furthermore, it robustly characterizes uncertainties from systematics, i.e. calibration uncertainties in data analysis. Note that estimating f(x) or \hat f(x) reflects complex structures in uncertainty analysis; whereas, the chi square minimization estimates f(x) via piecewise straight horizontal lines, assumes the homogeneous error in each piece (bin), forces statistical and systematic errors to be orthogonal, and as a consequence, inflates the size of error or produces biased best fits.

Either LTV, MSE, or MISE, even if we do not know the true model f(x) — if unknown, assessing statistical analysis results such as confidence levels/intervals may not be feasible; the reason that chi-square methods offer best fits and its N sigma error bars is that it assume the true model is Gaussian, or N(f(x),\sigma^2), or E(Y|X)=f(X)+\epsilon, V(\epsilon)=\sigma^2 where f(x) is a source model. On the other hand, Monte Carlo simulations, resampling methods like bootstrap, or posterior predictive probability (ppp) allows to infer the truth from which one can evaluate the p-value to indicate one’s confidence on the result from fitting analysis in a nonparametric fashion. — setting up proper models for \hat f(x) or \theta|Y would help assessing the total error in a more realistic manner than the chi-square minimization, additive errors, gaussian quadrature, or subjective expertise on systematics. The underlying notions and related theoretical statistics methodologies of LTV, MSE, or MISE could clarify the questions like how to quantify systematic errors and how systematic uncertainties are related to statistical uncertainties. Well, nothing will make me and astronomers happier if those errors are independent and additive. Even more exuberant, systematic uncertainty can be factorized.

[ArXiv] 1st week, Nov. 2007

hlee — Fri, 02 Nov 2007 21:59:08 +0000

To be exact, the title of this posting should contain 5th week, Oct, which seems to be the week of EGRET. In addition to astro-ph papers, although they are not directly related to astrostatistics, I include a few statistics papers which may be profitable for astronomical data analysis.

[astro-ph:0710.4966]
Uncertainties of the antiproton flux from Dark Matter annihilation in comparison to the EGRET excess of diffuse gamma rays by Iris Gebauer
[astro-ph:0710.5106]
The dark connection between the Canis Major dwarf, the Monoceros ring, the gas flaring, the rotation curve and the EGRET excess of diffuse Galactic Gamma Rays by W. de Boer et.al.
[astro-ph:0710.5119]
Determination of the Dark Matter profile from the EGRET excess of diffuse Galactic gamma radiation by Markus Weber
[astro-ph:0710.5171]
Systematic Bias in Cosmic Shear: Beyond the Fisher Matrix by A.Amara and A. Refregier
[astro-ph:0710.5560]
Principal Component Analysis of the Time- and Position-Dependent Point Spread Function of the Advanced Camera for Surveys by M.J. Jee et.al.
[astro-ph:0710.5637]
A method of open cluster membership determination by G. Javakhishvili et.al.
[stat.CO:0710.5670]
An Elegant Method for Generating Multivariate Poisson Data by I. Yahav and G.Shmueli
[astro-ph:0710.5788]
Variations in Stellar Clustering with Environment: Dispersed Star Formation and the Origin of Faint Fuzzies by B. G. Elmegreen
[math.ST:0710.5749]
On the Laplace transform of some quadratic forms and the exact distribution of the sample variance from a gamma or uniform parent distribution by T.Royen
[math.ST:0710.5797]
The Distribution of Maxima of Approximately Gaussian Random Fields by Y. Nardi, D.Siegmund and B.Yakir
[astro-ph:0711.0177]
Maximum Likelihood Method for Cross Correlations with Astrophysical Sources by R.Jansson and G. R. Farrar
[stat.ME:0711.0198]
A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap by U. von Luxburg and V. H. Franz

The AstroStat Slog » variance

[MADS] Law of Total Variance

[ArXiv] 1st week, Nov. 2007