All models are wrong, but some are useful.

All models are wrong, but some are useful. –

George Box



I just saw this web site with the probability plots on the probability papers. Is this real? Does somebody use this type of analysis when everything is done on the computers?

Quote from the web page:

“… probability plotting involves a physical plot of the data on specially constructed *probability plotting paper*. This method is easily implemented by hand, given that one can obtain the appropriate probability plotting paper.”

What if R. A. Fisher was hired by the Royal Observatory in spite that his interest was biology and agriculture, or W. S. Gosset^{[1]} instead of brewery? An article by E.L. Lehmann made me think this *what if*. If so, astronomers could have handled errors better than now.

- Gosset’s pen name was Student, from which the name, Student-t in t-distribution or t-test was spawned.[↩]

http://www.stanford.edu/group/mmds and I expect the same for this year. The workshop title may not attract astronomers but the contents, tools, methodologies, and theory are modern astronomy friendly. Astronomers can motivate, initiate, and push further these researchers at the workshop, which I believe currently happening without broad recognitions (foremost interdisciplinary works tend to stay within research groups).

For a discipline that relies so heavily on images, it is rather surprising how little use astronomy makes of the vast body of work on image analysis carried out by mathematicians and computer scientists. Mathematical morphology, for example, can be extremely useful in enhancing, recognizing, and extracting useful information from densely packed astronomical

images.

The building blocks of mathematical morphology are two operators, **Erode[I|Y]** and **Dilate[I|Y]**,

Now it's time for me to write my own astrostat papers instead of spending time for sieving them from [arXiv]. It has been an irresistible temptation scanning daily [arXiv] preprints to look for astronomy and sometimes statistics papers that 1. adopt statistics, 2. contain statistically challenging problems, 3. could be improved by more rigorous statistical applications, 4. look like abusing statistics, 5. may inspire statisticians by the data sets, or 6. might be useful for astronomers' advancement in the data analysis. The temptation grew too much to be handled. The amount of papers belong to the above selection criteria seems to grow as my understanding widens. Also the mesh gets loose and starts to show holes.

One realization of mine during the meeting was related to a cultural difference; therefore, there is no relation to any presentations during the 212th AAS in this post. Please, correct me if you find wrong statements. I cannot cover all perspectives from both disciplines but I think there are two distinct fashions in practicing normalization.

What is **systematic error**? Can it be modeled statistically? Is it random? Is it fixed? Is it a bias? Is it …?

While discussing different view points on the term, *clustering*, one of the conversers led me to his colleague's poster. This poster (I don't remember its title and abstract) was my favorite from all posters in the meeting.

I was questioned by two attendees, acquainted before the AAS, if I can suggest them clustering methods relevant to their projects. After all, we spent quite a time to clarify the term **clustering.**

You all may have heard that GLAST launched on June 11, and the mission is going smoothly. Via Josh Grindlay comes news that Steve Ritz, the GLAST Project Scientist at GSFC, is keeping a weblog dedicated to it at

and intends to post status reports and related information on it.

From Protassov et al. (2002, ApJ, 571, 545), here is a formal expression for the Likelihood Ratio Test Statistic,

T_{LRT}= -2 ln R(D,Θ_{0},Θ)

R(D,Θ_{0},Θ) = [ sup_{θεΘ}_{0}p(D|Θ_{0}) ] / [ sup_{θεΘ}p(D|Θ) ]

where D are an independent data sample, Θ are model parameters {θ_{i}, i=1,..M,M+1,..N}, and Θ_{0} form a subset of the model where θ_{i} = θ_{i}^{0}, i=1..M are held fixed at their nominal values. That is, Θ represents the full model and Θ_{0} represents the simpler model, which is a subset of Θ. R(D,Θ_{0},Θ) is the ratio of the maximal (technically, supremal) likelihoods of the simpler model to that of the full model.



As Prof. Speed said, PCA is prevalent in astronomy, particularly this week. Furthermore, a paper explicitly discusses R, a popular statistics package.

Believe it or not, I saw ANOVA (ANalysis Of VAriance) from a poster at AAS. This acronym was considered as one of very statistical jargons that one would never see in an astronomical meeting. I think you like to know the story in detail.