Archive for the ‘Bad AstroStat’ Category.

#### Quotes from Common Errors in Statistics

by P.I.Good and J.W.Hardin. Publisher’s website

My astronomer neighbor mentioned this book a while ago and quite later I found intriguing quotes. Continue reading ‘Quotes from Common Errors in Statistics’ »

#### [MADS] plug-in estimator

I asked a couple of astronomers if they heard the term plug-in estimator and none of them gave me a positive answer. Continue reading ‘[MADS] plug-in estimator’ »

#### Use and Misuse of Chi-square

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

1. Lack of independence among the single events or measures
2. Small theoretical frequencies
3. Neglect of frequencies of non-occurrence
4. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)
5. Indeterminate theoretical frequencies
6. Incorrect or questionable categorizing
7. Use of non-frequency data
8. Incorrect determination of the number of degrees of freedom
9. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From “Chapter 10: On the Use and Misuse of Chi-square” by K.L.Delucchi in A Handbook for Data Analysis in the Behavioral Sciences (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled “The Use and Misuse of the Chi-square” published in Psychological Bulletin. Continue reading ‘Use and Misuse of Chi-square’ »

#### 4754 d.f.

I couldn’t believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I’ve often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom. Continue reading ‘4754 d.f.’ »

#### Correlation is not causation

What XKCD says:

The mouseover text on the original says “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.”

It is a bad habit, hard to break, the temptation is great.

#### Borel Cantelli Lemma for the Gaussian World

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

1. It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[]
2. To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[]

#### after “Thanks to Henrietta Leavitt”

Personally, it was a highly anticipated symposium at CfA because I was fascinated about the female computers’ (or astronomers’) contributions that occurred here about a century ago even though at that time women were not considered as scientists but mere assistants for tedious jobs. Continue reading ‘after “Thanks to Henrietta Leavitt”’ »

#### [Q] Objectivity and Frequentist Statistics

Is there an objective method to combine measurements of the same quantity obtained with different instruments?

Suppose you have a set of N1 measurements obtained with one detector, and another set of N2 measurements obtained with a second detector. And let’s say you wanted something as simple as an estimate of the mean of the quantity (say the intensity) being measured. Let us further stipulate that the measurement errors of each of the points is similar in magnitude and neither instrument displays any odd behavior. How does one combine the two datasets without appealing to subjective biases about the reliability or otherwise of the two instruments? Continue reading ‘[Q] Objectivity and Frequentist Statistics’ »

#### Classification and Clustering

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

#### A History of Markov Chain Monte Carlo

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

#### appealing eyes == powerful method

To claim results are powerful statistically, astronomers highly rely on eyeballing techniques (need apprenticeship to acquire skills but look subjective to me without such training). Some cases, I know actual statistical tests to support or to dissuade those claims. Hence, I believe astronomers are well aware of those statistical tests. I guess they are afraid that those statistics may reject their claims or are not powerful enough in numeric metrics. Instead, they spend efforts to make graphics more appealing. Continue reading ‘appealing eyes == powerful method’ »

#### my first AAS. VI. Normalization

One realization of mine during the meeting was related to a cultural difference; therefore, there is no relation to any presentations during the 212th AAS in this post. Please, correct me if you find wrong statements. I cannot cover all perspectives from both disciplines but I think there are two distinct fashions in practicing normalization. Continue reading ‘my first AAS. VI. Normalization’ »

#### The LRT is worthless for …

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

#### The Burden of Reviewers

Astronomers write literally thousands of proposals each year to observe their favorite targets with their favorite telescopes. Every proposal must be accompanied by a technical justification, where the proposers demonstrate that their goal is achievable, usually via a simulation. Surprisingly, a large number of these justifications are statistically unsound. Guest Slogger Simon Vaughan describes the problem and shows what you can do to make reviewers happy (and you definitely want to keep reviewers happy).
Continue reading ‘The Burden of Reviewers’ »

#### Eddington versus Malmquist

During the runup to his recent talk on logN-logS, Andreas mentioned how sometimes people are confused about the variety of statistical biases that afflict surveys. They usually know what the biases are, but often tend to mislabel them, especially the Eddington and Malmquist types. Sort of like using “your” and “you’re” interchangeably, which to me is like nails on a blackboard. So here’s a brief summary: Continue reading ‘Eddington versus Malmquist’ »