Archive for the ‘Frequentist’ Category.

Quote of the Week, October 3, 2007

From the ever-quotable Andrew Gelman comes this gem, which he calls a Folk Theorem :

When things are hard to compute, often the model doesn’t fit the data. Difficulties in computation are therefore often model problems… [When the computation isn't working] we have the duty and freedom to think about models.

Continue reading ‘Quote of the Week, October 3, 2007’ »

[ArXiv] CMB statistics, Sept. 7, 2007

From arxiv/astro-ph:0709.1144v1:
Cosmic Microwave Background Statistics for a Direction-Dependent Primordial Power Spectrum by A. R. Pullen and M. Kamionkowski

The authors developed cosmic microwave background statistics for a primordial power spectrum, motivated from the needs of testing the cosmological common assumption, i.e. the statistical isotropy of primordial perturbations. This statistics is for a primordial power spectrum, depending on the direction and the magnitude of the Fourier wavevector. Statistically speaking, the most interesting part is their construction of the minimum-variance estimators for the coefficients of a spherical-harmonic expansion of the direction-dependence of the primordial power spectrum.

[ArXiv] Identifiability and mixtures of distributions, Aug. 3, 2007

From arxiv/ 0708.0499v1
Inference for mixtures of symmetric distributions by Hunter, Wang, and Hettmansperger, Annals of Statistics, 2007, Vol.35(1), pp.224-251.
Continue reading ‘[ArXiv] Identifiability and mixtures of distributions, Aug. 3, 2007’ »

Cross-validation for model selection

One of the most frequently cited papers in model selection would be An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion by M. Stone, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1 (1977), pp. 44-47.
(Akaike’s 1974 paper, introducing Akaike Information Criterion (AIC), is the most often cited paper in the subject of model selection).
Continue reading ‘Cross-validation for model selection’ »

[ArXiv] Poisson Mixture, Aug. 16, 2007

From arxiv/
Estimating the number of classes by Mao and Lindsay

This study could be linked to identifying the number of lines from Poisson nature x-ray count data, one of the key interests for astronomers. However, as pointed by the authors, estimating the numbers of classes is a difficult statistical problem. I.J.Good[1] said that

I don’t believe it is usually possible to estimate the number of species, but only an appropriate lower bound to that number. This is because there is nearly always a good chance that there are a very large number of extremely rare species.

Continue reading ‘[ArXiv] Poisson Mixture, Aug. 16, 2007’ »

  1. courtesy of the paper: Estimating the number of species: A review by Bunge and Fitzpatrick (1993), JASA, 88, 364-373.[]

Quote of the Week, July 26, 2007

Peter Bickel:

“Bayesian” methods have, I think, rightly gained favor in astronomy
as they have in other fields of statistical application. I put “Bayesian” in quotation marks because I do not believe this marks a revival in the sciences in the belief in personal probability. To me it rather means that all information on hand should be used
in model construction, coupled with the view of Box[1979 etc], who considers himself a Bayesian:

Models, of course, are never true but fortunately it is only necessary that they be useful.

The Bayesian paradigm permits one to construct models and hence statistical methods which reflect such information in an, at least in principle, marvellously simple way. A frequentist such as myself feels as at home with these uses of Bayes principle
as any Bayesian.

From Bickel, P. J. “An Overview of SCMA II”, in Statistical Challenges in Modern Astronomy II, editors G. Jogesh Babu and Eric D. Feigelson, 1997, Springer-Verlag, New York,p 360.

[Box 1979] Box, G. E. P. , 1979, “Some Problems of statistics and everyday life”. J. Amer. Statst. Assoc., 74, 1-4.

Peter Bickle had so many interesting perspectives in his comments at these SCMA conferences that it was hard to choose just one set.

Quote of the Week, July 19, 2007

Ten years ago, Astrophysicist John Nousek had this answer to Hyunsook Lee’s question “What is so special about chi square in astronomy?”:

The astronomer must also confront the problem that results need to be published and defended. If a statistical technique has not been widely applied in astronomy before, then there are additional burdens of convincing the journal referees and the community at large that the statistical methods are valid.

Certain techniques which are widespread in astronomy and seem to be accepted without any special justification are: linear and non-linear regression (Chi-Square analysis in general), Kolmogorov-Smirnov tests, and bootstraps. It also appears that if you find it in Numerical Recipes (Press etal. 1992) that it will be more likely to be accepted without comment.

…Note an insidious effect of this bias, astronomers will often choose to utilize a widely accepted statistical tool, even into regimes where the tool is known to be invalid, just to avoid the problem of developping or researching appropriate tools.

From pg 205, in “Discussion by John Nousek” (of Edward J. Wegman et. al., “Statistical Software, Siftware, and Astronomy”), in Statistical Challenges in Modern Astronomy II”, editors G. Jogesh Babu and Eric D. Feigelson, 1997, Springer-verlag, New York.

[ArXiv] Matching Sources, July 11, 2007

From arxiv/astro-ph: 0707.1611 Probabilistic Cross-Identification of Astronomical Sources by Budavari and Szalay

As multi-wave length studies become more popular, various source matching methodologies have been discussed. One of such methods particularly focused on Bayesian idea was introduced by Budavari and Szalay with a demand for symmetric algorithms in a unified framework.
Continue reading ‘[ArXiv] Matching Sources, July 11, 2007’ »

Quote of the Week, July 12, 2007

Ingrid Daubechies, color gif from her websiteThis is from the very interesting Ingrid Daubechies interview by Dorian Devins,
, National Academy of Sciences, U.S.A., 2004. It is from part 6, where Ingrid Daubechies speaks of her early mathematics paper on wavelets. She tries to put the impact into context:

I really explained in the paper where things came from. Because, well, the mathematicians wouldn’t have known. I mean, to them this would have been a question that really came out of nowhere. So, I had to explain it …

I was very happy with [the paper]; I had no inkling that it would take off like that… [Of course] the wavelets themselves are used. I mean, more than even that. I explained in the paper how I came to that. I explained both [a] mathematicians way of looking at it and then to some extent the applications way of looking at it. And I think engineers who read that had been emphasizing a lot the use of Fourier transforms. And I had been looking at the spatial domain. It generated a different way of considering this type of construction. I think, that was the major impact. Because then other constructions were made as well. But I looked at it differently. A change of paradigm. Well, paradigm, I never know what that means. A change of … a way of seeing it. A way of paying attention.

[ArXiv] Spectroscopic Survey, June 29, 2007

From arXiv/astro-ph:0706.4484

Spectroscopic Surveys: Present by Yip. C. overviews recent spectroscopic sky surveys and spectral analysis techniques toward Virtual Observatories (VO). In addition that spectroscopic redshift measures increase like Moore’s law, the surveys tend to go deeper and aim completeness. Mainly elliptical galaxy formation has been studied due to more abundance compared to spirals and the galactic bimodality in color-color or color-magnitude diagrams is the result of the gas-rich mergers by blue mergers forming the red sequence. Principal component analysis has incorporated ratios of emission line-strengths for classifying Type-II AGN and star forming galaxies. Ly╬▒ identifies high z quasars and other spectral patterns over z reveal the history of the early universe and the characteristics of quasars. Also, the recent discovery of 10 satellites to the Milky Way is mentioned.
Continue reading ‘[ArXiv] Spectroscopic Survey, June 29, 2007’ »

[ArXiv] Classical confidence intervals, June 25, 2007

Comments on the unified approach to the construction of classical confidence intervals

This paper comments on classical confidence intervals and upper limits, as the so-called a flip-flopping problem, both of which are related asymptotically (when n is large enough) by the definition but cannot be converted from one to the another by preserving the same coverage due to the poisson nature of the data.
Continue reading ‘[ArXiv] Classical confidence intervals, June 25, 2007’ »

[ArXiv] Kernel Regression, June 20, 2007

One of the papers from arxiv/astro-ph discusses kernel regression and model selection to determine photometric redshifts astro-ph/0706.2704. This paper presents their studies on choosing bandwidth of kernels via 10 fold cross-validation, choosing appropriate models from various combination of input parameters through estimating root mean square error and AIC, and evaluating their kernel regression to other regression and classification methods with root mean square errors from literature survey. They made a conclusion of flexibility in kernel regression particularly for data at high z.
Continue reading ‘[ArXiv] Kernel Regression, June 20, 2007’ »

Quote of the Week, June 20, 2007

These quotes are in the opposite spirit of the last two Bayesian quotes.
They are from the excellent “R”-based , Tutorial on Non-Parametrics given by
Chad Schafer and Larry Wassserman at the 2006 SAMSI Special Semester on AstroStatistics (or here ).

Chad and Larry were explaining trees:

For more sophistcated tree-searches, you might try Robert Nowak [and his former student, Becca Willett --- especially her "software" pages]. There is even Bayesian CART — Classifcation And Regression Trees. These can take 8 or 9 hours to “do it right”, via MCMC. BUT [these results] tend to be very close to [less rigorous] methods that take only minutes.

Trees are used primarily by doctors, for patients: it is much easier to follow a tree than a kernel estimator, in person.

Trees are much more ad-hoc than other methods we talked about, BUT they are very user friendly, very flexible.

In machine learning, which is only statistics done by computer scientists, they love trees.

[ArXiv] Solar Cycle, June 18, 2007

From arxiv/astro-ph, arXiv:0706.2590v1 Extreme Value Theory and the Solar Cycle by Ramos, A. This paper might drag a large attention from CHASC members.
Continue reading ‘[ArXiv] Solar Cycle, June 18, 2007’ »

On the unreliability of fitting

Despite some recent significant advances in Statistics and its applications to Astronomy (Cash 1976, Cash 1979, Gehrels 1984, Schmitt 1985, Isobe et al. 1986, van Dyk et al. 2001, Protassov et al. 2002, etc.), there still exist numerous problems and limitations in the standard statistical methodologies that are routinely applied to astrophysical data. For instance, the basic algorithms used in non-linear curve-fitting in spectra and images have remained unchanged since the 1960′s: the downhill simplex method of Nelder & Mead (1965) modified by Powell, and methods of steepest descent exemplified by Levenberg-Marquardt (Marquardt 1963). All non-linear curve-fitting programs currently in general use (Sherpa, XSPEC, MPFIT, PINTofALE, etc.) with the exception of Monte Carlo and MCMC methods are implementations based on these algorithms and thus share their limitations.
Continue reading ‘On the unreliability of fitting’ »