Archive for October 2009

[ArXiv] Voronoi Tessellations

As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various nonparametric methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by computational geometry). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. Continue reading ‘[ArXiv] Voronoi Tessellations’ »

Do people use Fortran?

I’m very sure that Fortran is one of the major scientific programming languages. Many functions, modules, and libraries are written in this language. Without being aware of, these routines are ported into many script languages. However, I become curious whether Fortran is still the major force in astronomy or statistics, compared to say 20 years ago (10 seems too small). Continue reading ‘Do people use Fortran?’ »

The chance that A has nukes is p%

I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.

First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first. Continue reading ‘The chance that A has nukes is p%’ »

[ArXiv] classifying spectra

Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications
by Murphy, Dean, and Raftery

Classifying or clustering (or semi supervised learning) spectra is a very challenging problem from collecting statistical-analysis-ready data to reducing the dimensionality without sacrificing complex information in each spectrum. Not only how to estimate spiky (not differentiable) curves via statistically well defined procedures of estimating equations but also how to transform data that match the regularity conditions in statistics is challenging.
Continue reading ‘[ArXiv] classifying spectra’ »

Scatter plots and ANCOVA

Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists[1]. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »

  1. This is not an assuring absolute statement but a personal impression after reading articles of various fields in addition to astronomy. My readings of other fields tell that many rely on correlation statistics but less scatter plots by adding straight lines going through data sets for the purpose of imposing relationships within variable pairs[]

[MADS] logistic regression

Although a bit of time has elapsed since my post space weather, saying that logistic regression is used for prediction, it looks like still true that logistic regression is rarely used in astronomy. Otherwise, it could have been used for the similar purpose not under the same statistical jargon but under the Bayesian modeling procedures. Continue reading ‘[MADS] logistic regression’ »


Boyle & Smith (1969)

The 2009 Physics Nobel is shared (along with Charles Kao, who is cited for suggesting optic fibers) by Willard Boyle and George Smith, inventors of the Charge-coupled Device.

The CCD, of course, is the workhorse of modern Astronomy. I cannot even imagine how things would be without it.
Continue reading ‘Boyle & Smith (1969)’ »

Goodness-of-fit tests

When it comes to applying statistics for measuring goodness-of-fit, the Pearson χ2 test is the dominant player in a race and the Kolmogorov-Smirnoff test statistic trails far behind. Although it seems almost invisible in this race, there are more various non-parametric statistics for testing goodness-of-fit and for comparing the sampling distribution to a reference distribution as legitimate race participants trained by many statisticians. Listing their names probably useful to some astronomers when they find the underlying assumptions for the χ2 test do not match the data. Perhaps, some astronomers want to try other nonparametric test statistics other than the K-S test. I’ve seen other test statistics in astronomical journals from time to time. Depending on data and statistical properties, one test statistic could work better than the other; therefore, it’s worthwhile to keep the variety in one’s mind that there are other tests beyond the χ2 test goodness-of-fit test statistic. Continue reading ‘Goodness-of-fit tests’ »

[MADS] Kalman Filter

I decide to discuss Kalman Filter a while ago for the slog after finding out that this popular methodology is rather underrepresented in astronomy. However, it is not completely missing from ADS. I see that the fulltext search and all bibliographic source search shows more results. Their use of Kalman filter, though, looked similar to the usage of “genetic algorithms” or “Bayes theorem.” Probably, the broad notion of Kalman filter makes it difficult my finding Kalman Filter applications by its name in astronomy since often wheels are reinvented (algorithms under different names have the same objective). Continue reading ‘[MADS] Kalman Filter’ »

data analysis system and its documentation

So far, I didn’t complain much related to my “statistician learning astronomy” experience. Instead, I’ve been trying to emphasize how fascinating it is. I hope that more statisticians can join this adventure when statisticians’ insights are on demand more than ever. However, this positivity seems not working so far. In two years of this slog’s life, there’s no posting by a statistician, except one about BEHR. Statisticians are busy and well distracted by other fields with more tangible data sets. Or compared to other fields, too many obstacles and too high barriers exist in astronomy for statisticians to participate. I’d like to talk about these challenges from my ends.[1] Continue reading ‘data analysis system and its documentation’ »

  1. This is quite an overdue posting. Links and associated content can be outdated.[]