The AstroStat Slog » likelihood

The chance that A has nukes is p%

hlee — Fri, 23 Oct 2009 17:26:07 +0000

I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.

First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first.

From the frequentists perspective, computing the chance of e coli infection is investigating sample of lettuce and counts species that are infected: n is the number of infected species and N is the total sample size. 1% means one among 100. Such percentage reports and their uncertainties are very familiar scene during any election periods for everyone. From Bayesian perspective, Pr(p|D) ~ L(D|p) pi(p), properly choosing likelihoods and priors, one can estimate the chance of e coli infection and uncertainty. Understanding of sample species and a prior knowledge helps to determine likelihoods and priors.

How about the chance that country A has nukes? Do we have replicates of country A so that a committee investigate each country and count ones with nukes to compute the chance? We cannot do that. Traditional frequentist approach, based on counting, does not work here to compute the chance. Either using fiducial likelihood approach or Bayesian approach, i.e. carefully choosing a likelihood function adequately (priors are only for Bayesian) allows one to compuate such probability of interest. In other words, those computed chances highly depend on the choice of model and are very subjective.

So, here’s my concern. It seems like that astronomers want to know the chance of their spectral data being described by a model (A*B+C)*D (each letter stands for one of models such as listed in Sherpa Models). This is more like computing the chance of having nukes in country A, not counting frequencies of the event occurrence. On the other hand, p-value from goodness of fit tests, LRTs, or F-tests is a number from the traditional frequentists’ counting approach. In other words, p-value accounts for, under the null hypothesis (the (A*B+C)*D model is the right choice so that residuals are Gaussian), how many times one will observe the event (say, reduced chi^2 >1.2) if the experiments are done N times. The problem is that we only have one time experiment and that one spectrum to verify the (A*B+C)*D is true. Goodness of fit or LRT only tells the goodness or the badness of the model, not the statistically and objectively quantified chance.

In order to know the chance of the model (A*B+C)*D, like A has nuke with p%, one should not rely on p-values. If you have multiple models, one could compute pairwise relative chances i.e. odds ratios, or Bayes factors. However this does not provide the uncertainty of the chance (astronomers have the tendency of reporting uncertainties of any point estimates even if the procedure is statistically meaningless and that quantified uncertainty is not statistical uncertainty, as in using delta chi^2=1 to report 68% confidence intervals). There are various model selection criteria that cater various conditions embedded in data to make a right model choice among other candidate models. In addition, post-inference for astronomical models is yet a very difficult problem.

In order to report the righteous chance of (A*B+C)*D requires more elaborated statistical modeling, always brings some fierce discussions between frequentists and Bayesian because of priors and likelihoods. Although it can be very boring process, I want astronomers to leave the problem to statisticians instead of using inappropriate test statistics and making creative interpretation of statistics.

Please, keep this question in your mind when you report probability: what kind of chance are you computing? The chance of e coli infection? Or the chance that A has nukes? Make sure to understand that p-values from data analysis packages does not tell you that the chance the model (A*B+C)*D is (one minus p-value)%. You don’t want to report one minus p-value from a chi-square test statistic as the chance that A has nukes.

systematic errors

hlee — Fri, 06 Mar 2009 19:42:18 +0000

Ah ha~ Once I questioned, “what is systematic error?” (see [Q] systematic error.) Thanks to L. Lyons’ work discussed in [ArXiv] Particle Physics, I found this paper, titled Systematic Errors describing the concept and statistical inference related to systematic errors in the field of particle physics. It, gladly, shares lots of similarity with high energy astrophysics.

Systematic Errors by J. Heinrich and L.Lyons
in Annu. Rev. Nucl. Part. Sci. (2007) Vol. 57 pp.145-169 [http://adsabs.harvard.edu/abs/2007ARNPS..57..145H]

The characterization of two error types, systematic and statistical error is illustrated with an simple physics experiment, the pendulum. They described two distinct sources of systematic errors.

…the reliable assessment of systematics requires much more thought and work than for the corresponding statistical error.
Some errors are clearly statistical (e.g. those associated with the reading errors on T and l), and others are clearly systematic (e.g., the correction of the measured g to its sea level value). Others could be regarded as either statistical or systematic (e.g., the uncertainty in the recalibration of the ruler). Our attitude is that the type assigned to a particular error is not crucial. What is important is that possible correlations with other measurements are clearly understood.

Section 2 contains a very nice review in english, not in mathematical symbols, about the basics of Bayesian and frequentist statistics for inference in particle physics with practical accounts. Comparison of Bayes and Frequentist approaches is provided. (I was happy to see that χ² is said to not belong to frequentist methods. It is just a popular method in references about data analysis in astronomy, not in modern statistics. If someone insists, statisticians could study the χ² statistic under some assumptions and conditions that suit properties of astronomical data, investigate the efficiency and completeness of grouped Poission counts for Gaussian approximation within the χ² minimization process, check degrees of information loss, and so forth)

To a Bayesian, probability is interpreted as the degree of belief in a statement. …
In contast, frequentists define probability via a repeated series of almost identical trials;…

Section 3 clarifies the notion of p-values as such:

It is vital to remember that a p-value is not the probability that the relevant hypothesis is true. Thus, statements such as “our data show that the probability that the standard model is true is below 1%” are incorrect interpretations of p-values.

This reminds me of the null hypothesis probability that I often encounter in astronomical literature or discussions to report the X-ray spectral fitting results. I believe astronomers using the null hypothesis probability are confused between Bayesian and frequentist concepts. The computation is based on the frequentist idea, p-value but the interpretation is given via Bayesian. A separate posting on the null hypothesis probability will come shortly.

Section 4 describes both Bayesian and frequentist ways to include systematics. Through its parameterization (for Gaussian, parameterization is achieved with additive error terms, or none zero elements in full covariance matrix), systematic uncertainty is treated as nuisance parameters in the likelihood for both Bayesian and frequentist alike although the term “nuisance” appears in frequentist’s likelihood principles. Obtaining the posterior distribution of a parameter(s) of interest requires marginalization over uninteresting parameters which are seen as nuisance parameters in frequentist methods.

The miscellaneous section (Sec. 6) is the most useful part for understanding the nature and strategies for handling systematic errors. Instead of copying the whole section, here are two interesting quotes:

When the model under which the p-value is calculated has nuisance parameters (i.e. systematic uncertainties) the proper computation of the p-value is more complicated.

The contribution form a possible systematic can be estimated by seeing the change in the answer a when the nuisance parameter is varied by its uncertainty.

As warned, it is not recommended to combine calibrated systematic error and estimated statistical error in quadrature, since we cannot assume those errors are uncorrelated all the time. Except the disputes about setting a prior distribution, Bayesian strategy works better since the posterior distribution is the distribution of the parameter of interest, directly from which one gets the uncertainty in the parameter. Remember, in Bayesian statistics, parameters are random whereas in frequentist statistics, observations are random. The χ² method only approximates uncertainty as Gaussian (equivalent to the posterior with a gaussian likelihood centered at the best fit and with a flat prior) with respect to the best fit and combines different uncertainties in quadrature. Neither of strategies is superior almost always than the other in a general term of performing statistical inference; however, case-specifically, we can say that one functions better than the other. The issue is how to define a model (distribution, distribution family, or class of functionals) prior to deploying various methodologies and therefore, understanding systematic errors in terms of model, or parametrization, or estimating equation, or robustness became important. Unfortunately, systematic descriptions about systematic errors from the statistical inference perspective are not present in astronomical publications. Strategies of handling systematic errors with statistical care are really hard to come by.

Still I think that their inclusion of systematic errors is limited to parametric methods, in other words, without parametrization of systematic errors, one cannot assess/quantify systematic errors properly. So, what if such parametrization of systematics is not available? I thought that some general semi-parametric methodology possibly assists developing methods of incorporating systematic errors in spectral model fitting. Our group has developed a simple semi-parametric way to incorporate systematic errors in X-ray spectral fitting. If you like to know how it works, please check out my poster in pdf. It may be viewed too conservative as if projection since instead of parameterizing systemtatics, the posterior was empirically marginalized over the systematics, the hypothetical space formed by simulated sample of calibration products.

I believe publications about handling systematic errors will enjoy prosperity in astronomy and statistics as long as complex instruments collect data. Beyond combining in quadrature or Gaussian approximation, systematic errors can be incorporated in a more sophisticated fashion, parametrically or nonparametrically. Particularly for the latter, statisticians knowledge and contributions are in great demand.

A test for global maximum

hlee — Wed, 02 Jul 2008 02:10:09 +0000

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution.

I didn’t see any ways to confirm that best fit results from XSPEC or Sherpa are a global maximum without searching whole parameter space. My little understanding tells that many fitting algorithms do not guarantee a global maximum. By checking that the best fit solution is the global maximum and subsequently, the obtained error bar is expected to have the nominal coverage, we could save efforts of searching whole parameter space.

my first AAS. V. measurement error and EM

hlee — Fri, 20 Jun 2008 03:46:05 +0000

While discussing different view points on the term, clustering, one of the conversers led me to his colleague’s poster. This poster (I don’t remember its title and abstract) was my favorite from all posters in the meeting.

He rewrote the EM algorithm to include measurement errors in redshifts. Indexed parameters associated with different redshifts and corresponding standard deviations (measurement errors, treated as nuisance parameters) were included in the likelihood function that corrected bias and manifested bimodality in the LFs clearly at the different evolutionary stages.

I encouraged him to talk statisticians to characterize and to generalize his measurement error included likelihoods, and to optimize his EM algorithm. Because of approximations in algebra and the many parameters from measurement errors from redshifts, some assumptions and constraints were imposed intensively and I thought a collaboration with statisticians suits to get around constraints and to generalize his measurement error included likelihood.

Likelihood Ratio Test Statistic [Equation of the Week]

vlk — Wed, 18 Jun 2008 17:00:30 +0000

From Protassov et al. (2002, ApJ, 571, 545), here is a formal expression for the Likelihood Ratio Test Statistic,

T_LRT = -2 ln R(D,Θ₀,Θ)

R(D,Θ₀,Θ) = [ sup_θεΘ₀ p(D|Θ₀) ] / [ sup_θεΘ p(D|Θ) ]

where D are an independent data sample, Θ are model parameters {θ_i, i=1,..M,M+1,..N}, and Θ₀ form a subset of the model where θ_i = θ_i⁰, i=1..M are held fixed at their nominal values. That is, Θ represents the full model and Θ₀ represents the simpler model, which is a subset of Θ. R(D,Θ₀,Θ) is the ratio of the maximal (technically, supremal) likelihoods of the simpler model to that of the full model.

When standard regularity conditions hold — the likelihoods p(D|Θ) and p(D|Θ₀) are thrice differentiable; Θ₀ is wholly contained within Θ, i.e., the nominal values {θ_i⁰, i=1..M} are not at the boundary of the allowed values of {θ_i}; and the allowed range of D are not dependent on the specific values of {θ_i} — then the LRT statistic is distributed as a χ²-distribution with the same number of degrees of freedom as the difference in the number of free parameters between Θ and Θ₀. These are important conditions, which are not met in some very common astrophysical problems (e.g, one cannot use it to test the significance of the existence of an emission line in a spectrum). In such cases, the distribution of T_LRT must be calibrated via Monte Carlo simulations for that particular problem before using it as a test for the significance of the extra model parameters.

Of course, an LRT statistic is not obliged to have exactly this form. When it doesn’t, even if the regularity conditions hold, it will not be distributed as a χ²-distribution, and must be calibrated, either via simulations, or analytically if possible. One example of such a statistic is the F-test (popularized among astronomers by Bevington). The F-test uses the ratio of the difference in the best-fit χ² to the reduced χ² of the full model, F=Δχ²/χ²_ν, as the statistic of choice. Note that the numerator by itself constitutes a valid LRT statistic for Gaussian data. This is distributed as the F-distribution, which results when a ratio is taken of two quantities each distributed as the χ². Thus, all the usual regularity conditions must hold for it to be applicable, as well as that the data must be in the Gaussian regime.

[ArXiv] 2nd week, June 2008

hlee — Mon, 16 Jun 2008 14:47:42 +0000

As Prof. Speed said, PCA is prevalent in astronomy, particularly this week. Furthermore, a paper explicitly discusses R, a popular statistics package.

[astro-ph:0806.1140] N.Bonhomme, H.M.Courtois, R.B.Tully
Derivation of Distances with the Tully-Fisher Relation: The Antlia Cluster
(Tully Fisher relation is well known and one of many occasions statistics could help. On the contrary, astronomical biases as well as measurement errors hinder from the collaboration).
[astro-ph:0806.1222] S. Dye
Star formation histories from multi-band photometry: A new approach (Bayesian evidence)
[astro-ph:0806.1232] M. Cara and M. Lister
Avoiding spurious breaks in binned luminosity functions
(I think that binning is not always necessary and overdosed, while there are alternatives.)
[astro-ph:0806.1326] J.C. Ramirez Velez, A. Lopez Ariste and M. Semel
Strength distribution of solar magnetic fields in photospheric quiet Sun regions (PCA was utilized)
[astro-ph:0806.1487] M.D.Schneider et al.
Simulations and cosmological inference: A statistical model for power spectra means and covariances
(They used R and its package Latin hypercube samples, lhs.)
[astro-ph:0806.1558] Ivan L. Andronov et al.
Idling Magnetic White Dwarf in the Synchronizing Polar BY Cam. The Noah-2 Project (PCA is applied)
[astro-ph:0806.1880] R. G. Arendt et al.
Comparison of 3.6 – 8.0 Micron Spitzer/IRAC Galactic Center Survey Point Sources with Chandra X-Ray Point Sources in the Central 40×40 Parsecs (K-S test)

[ArXiv] 3rd week, Mar. 2007

hlee — Fri, 21 Mar 2008 22:20:33 +0000

Markov chain Monte Carlo (MCMC) never misses a week from recently astro-ph. A book titled MCMC in astronomy will be a best seller. There are, in addition, very interesting non MCMC preprints.

[astro-ph:0803.2130] R. Aurich
A spatial correlation analysis for a toroidal universe (MCMC)
[astro-ph:0803.2120]M. Martinez & M. Errando
A new method to study energy-dependent arrival delays on photons from astrophysical sources (Likelihood function and goodness-of-fit)
[astro-ph:0803.2234] G. Dobler et.al.
Lensing Probabilities for Spectroscopically Selected Galaxy-Galaxy Strong Lenses (could it be helpful to lay out GREAT08 challenges statistically?)
[astro-ph:0803.2529] M. Bazot, S. Bourguignon and J. Christensen-Dalsgaard
Estimation of stellar parameters using Monte Carlo Markov Chains (MCMC)
[stat.AP:0803.2623] F. Dup\’e, J. Fadili, and J. L. Starck
A proximal iteration for deconvolving Poisson noisy images using sparse representations

I’m used to see Markov Chain Monte Carlo (or lower case c in Chain) from statistical journals but in astronomical journals, Monte Carlo Markov Chains seem to be standard.

[ArXiv] A fast Bayesian object detection

hlee — Wed, 05 Mar 2008 21:46:48 +0000

This is a quite long paper that I separated from [Arvix] 4th week, Feb. 2008:
[astro-ph:0802.3916] P. Carvalho, G. Rocha, & M.P.Hobso
A fast Bayesian approach to discrete object detection in astronomical datasets – PowellSnakes I
As the title suggests, it describes Bayesian source detection and provides me a chance to learn the foundation of source detection in astronomy.

First, I’d like to point out that my initial concerns from [astro-ph:0707.1611] Probabilistic Cross-Identification of Astronomical Sources are explained in sections 2, 3 and 6 about parameter space, its dimensionality, and priors in Bayesian model selection.

Second, I’d rather concisely list the contents of the paper as follows: (1) priors, various types but rooms were left for further investigations in future; (2) templates (such as point spread function, I guess), crucial for defining sources, and gaussian random field for noise; (3) optimization strategies for fast computation (source detection implies finding maxima and integration for evidence); (4) comparison with other works; (5) upper bound, tuning the threshold for acceptance/rejection to minimize the symmetric loss; (6) challenges of dealing likelihoods in Fourier space from incorporating colored noise (opposite to white noise); (7) decision theory from computing false negatives (undetected objects) and false positives (spurious objects). Many issues in computing Bayesian evidence, priors, tunning parameter relevant posteriors, and the peaks of maximum likelihoods; and approximating templates and backgrounds are carefully presented. The conclusion summarizes their PowellSnakes algorithm pictorially.

Thirdly, although my understanding of object detection and linking it to Bayesian techniques is very superficial, my reading this paper tells me that they propose some clever ways of searching full 4 dimensional space via Powell minimization (It seems to be related with profile likelihoods for a fast computation but it was not explicitly mentioned) and the detail could direct statisticians’ attentions for the improvement of computing efficiency and acceleration.

Fourth, I’d like to talk about my new knowledge that I acquired from this paper about errors in astronomy. Statisticians usually surprise at astronomical catalogs that in general come with errors next to single measurements. These errors are not measurement errors (errors calculated from repeated observations) but obtained from Fisher information owing to Cramer-Rao Lower Bound. The template likelihood function leads this uncertainty measure on each observation.

Lastly, in astronomy, there are many empirical rules, effects, and laws that bear uncommon names. Generally these are sophisticated rules of thumb or approximations of some phenomenon (for instance, Hubble’s law, though it’s well known) but they have been the driving away factors when statisticians reading astronomy papers. On the other hand, despite overwhelming names, when it gets to the point, the objective of mentioning such names is very statistical like regression (fitting), estimating parameters and their uncertainty, goodness-of-fit, truncated data, fast optimization algorithms, machine learning, etc. This paper mentions Sunyaev-Zel’dovich effect, which name scared me but I’d like to emphasize that this kind nomenclature may hinder from understanding details but could not block any collaborations.

[ArXiv] 4th week, Jan. 2008

hlee — Fri, 25 Jan 2008 16:37:12 +0000

Only three papers this week. There were a few more with chi-square fitting and its error bars but excluded.

[astro-ph:0801.3346] Hipparcos distances of Ophiuchus and Lupus cloud complexes M. Lombardi, C. Lada, & J. Alves (likelihoods and MCMC were used)
[astro-ph:0801.3543] Results of the ROTOR-program. II. The long-term photometric variability of weak-line T Tauri stars K.N. Grankin et. al. (discusses periodogram)
[astro-ph:0801.3822] Estimating the Redshift Distribution of Faint Galaxy Samples M. Lima et.al.

[ArXiv] 1st week, Jan. 2008

hlee — Fri, 04 Jan 2008 16:49:57 +0000

It’s a rather short list, this week and I hope I can maintain this conciseness afterwards. Happy new year to everyone.

[astro-ph:0801.0336] Astronomical Image Subtraction by Cross-Convolution F. Yuan & C. W. Akerlof
[math.TH:0801.0158] Frequency estimation based on the cumulated Lomb-Scargle periodogram C. L\’evy-Leduc, E. Moulines, & F. Roueff
[astro-ph:0801.0451] A cgi synthetic CMD calculator for the YY Isochrones P. Demarque et. al.
[astro-ph:0801.0554] Likelihood Analysis of CMB Temperature and Polarization Power Spectra S. Hamimeche & A. Lewis