Archive for the ‘Stat’ Category.

#### Erich Lehmann

He was one of the frequently cited statisticians in this slog because of his influence in statistics. It is extremely difficult to avoid his textbooks and his establishment of theoretical statistics when one begins to comprehend and to appreciate the modern theoretical statistics. To me, Testing Statistical Hypotheses and Theory of Point Estimation are two pillars of graduate statistical education. In addition, Elements of Large Sample Theory and Nonparametrics: Statistical Methods Based on Ranks are also eye openers. Continue reading ‘Erich Lehmann’ »

#### From Quantile Probability and Statistical Data Modeling

by Emanuel Parzen in Statistical Science 2004, Vol 19(4), pp.652-662 JSTOR

I teach that statistics (done the quantile way) can be simultaneously frequentist and Bayesian, confidence intervals and credible intervals, parametric and nonparametric, continuous and discrete data. My first step in data modeling is identification of parametric models; if they do not fit, we provide nonparametric models for fitting and simulating the data. The practice of statistics, and the modeling (mining) of data, can be elegant and provide intellectual and sensual pleasure. Fitting distributions to data is an important industry in which statisticians are not yet vendors. We believe that unifications of statistical methods can enable us to advertise, “What is your question? Statisticians have answers!”

I couldn’t help liking this paragraph because of its bitter-sweetness. I hope you appreciate it as much as I did.

#### some python modules

I was told to stay away from python and I’ve obeyed the order sincerely. However, I collected the following stuffs several months back at the instance of hearing about import inference and I hate to see them getting obsolete. At that time, collecting these modules and getting through them could help me complete the first step toward the quest Learning Python (the first posting of this slog). Continue reading ‘some python modules’ »

#### [ArXiv] Voronoi Tessellations

As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various nonparametric methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by computational geometry). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. Continue reading ‘[ArXiv] Voronoi Tessellations’ »

#### The chance that A has nukes is p%

I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.

First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first. Continue reading ‘The chance that A has nukes is p%’ »

#### [ArXiv] classifying spectra

[arXiv:stat.ME:0910.2585]
Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications
by Murphy, Dean, and Raftery

Classifying or clustering (or semi supervised learning) spectra is a very challenging problem from collecting statistical-analysis-ready data to reducing the dimensionality without sacrificing complex information in each spectrum. Not only how to estimate spiky (not differentiable) curves via statistically well defined procedures of estimating equations but also how to transform data that match the regularity conditions in statistics is challenging.
Continue reading ‘[ArXiv] classifying spectra’ »

#### Scatter plots and ANCOVA

Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists[1]. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »

1. This is not an assuring absolute statement but a personal impression after reading articles of various fields in addition to astronomy. My readings of other fields tell that many rely on correlation statistics but less scatter plots by adding straight lines going through data sets for the purpose of imposing relationships within variable pairs[]

Although a bit of time has elapsed since my post space weather, saying that logistic regression is used for prediction, it looks like still true that logistic regression is rarely used in astronomy. Otherwise, it could have been used for the similar purpose not under the same statistical jargon but under the Bayesian modeling procedures. Continue reading ‘[MADS] logistic regression’ »

#### Goodness-of-fit tests

When it comes to applying statistics for measuring goodness-of-fit, the Pearson χ2 test is the dominant player in a race and the Kolmogorov-Smirnoff test statistic trails far behind. Although it seems almost invisible in this race, there are more various non-parametric statistics for testing goodness-of-fit and for comparing the sampling distribution to a reference distribution as legitimate race participants trained by many statisticians. Listing their names probably useful to some astronomers when they find the underlying assumptions for the χ2 test do not match the data. Perhaps, some astronomers want to try other nonparametric test statistics other than the K-S test. I’ve seen other test statistics in astronomical journals from time to time. Depending on data and statistical properties, one test statistic could work better than the other; therefore, it’s worthwhile to keep the variety in one’s mind that there are other tests beyond the χ2 test goodness-of-fit test statistic. Continue reading ‘Goodness-of-fit tests’ »

#### [Books] Bayesian Computations

A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories. Continue reading ‘[Books] Bayesian Computations’ »

ARCH (autoregressive conditional heteroscedasticity) is a statistical model that considers the variance of the current error term to be a function of the variances of the previous time periods’ error terms. I heard that this model made Prof. Engle a Nobel prize recipient. Continue reading ‘[MADS] ARCH’ »

#### [ArXiv] Statistical Analysis of fMRI Data

[arxiv:0906.3662] The Statistical Analysis of fMRI Data by Martin A. Lindquist
Statistical Science, Vol. 23(4), pp. 439-464

This review paper offers some information and guidance of statistical image analysis for fMRI data that can be expanded to astronomical image data. I think that fMRI data contain similar challenges of astronomical images. As Lindquist said, collaboration helps to find shortcuts. I hope that introducing this paper helps further networking and collaboration between statisticians and astronomers.

List of similarities Continue reading ‘[ArXiv] Statistical Analysis of fMRI Data’ »

Kriging is the first thing that one learns from a spatial statistics course. If an astronomer sees its definition and application, almost every astronomer will say, “Oh, I know this! It is like the 2pt correlation function!!” At least this was my first impression when I first met kriging.

There are three distinctive subjects in spatial statistics: geostatistics, lattice data analysis, and spatial point pattern analysis. Because of the resemblance between the spatial distribution of observations in coordinates and the notion of spatially random points, spatial statistics in astronomy has leaned more toward the spatial point pattern analysis than the other subjects. In other fields from immunology to forestry to geology whose data are associated spatial coordinates of underlying geometric structures or whose data were sampled from lattices, observations depend on these spatial structures and scientists enjoy various applications from geostatistics and lattice data analysis. Particularly, kriging is the fundamental notion in geostatistics whose application is found many fields. Continue reading ‘[MADS] Kriging’ »

#### Beyond simple models-New methods for complex data

This is a special session at the January 2010 meeting of the AAS. It is scheduled for the afternoon of Thursday, Jan 7, 2-3:30pm.

Abstracts are due Sep 17.

Meeting Justification

We propose to highlight the growing use of ‘non-parametric’ techniques to distill meaningful science from today’s astronomical data. Challenges range from Kuiper objects to cosmology. We have chosen just a few ‘teaching’ examples from this lively interdisciplinary area.