The AstroStat Slog

Archive for the ‘Bad AstroStat’ Category.

Quotes from Common Errors in Statistics

Nov 13th, 2009| 12:13 pm | Posted by hlee

by P.I.Good and J.W.Hardin. Publisher’s website

My astronomer neighbor mentioned this book a while ago and quite later I found intriguing quotes. Continue reading ‘Quotes from Common Errors in Statistics’ »

Tags: book, Common Errors, confidence intervals, estimate
Category: Bad AstroStat, Cross-Cultural, Quotes | Comment

[MADS] plug-in estimator

Apr 20th, 2009| 09:34 pm | Posted by hlee

I asked a couple of astronomers if they heard the term plug-in estimator and none of them gave me a positive answer. Continue reading ‘[MADS] plug-in estimator’ »

Tags: biased, breakdown point, chi-square, confidence interval, coverage, delta chi-square, estimator, LAD, mean, median, plug-in, rmse
Category: Bad AstroStat, Cross-Cultural, Data Processing, Jargon, Uncertainty | 2 Comments

Use and Misuse of Chi-square

Mar 31st, 2009| 03:43 pm | Posted by hlee

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

1. Lack of independence among the single events or measures
2. Small theoretical frequencies
3. Neglect of frequencies of non-occurrence
4. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)
5. Indeterminate theoretical frequencies
6. Incorrect or questionable categorizing
7. Use of non-frequency data
8. Incorrect determination of the number of degrees of freedom
9. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From “Chapter 10: On the Use and Misuse of Chi-square” by K.L.Delucchi in A Handbook for Data Analysis in the Behavioral Sciences (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled “The Use and Misuse of the Chi-square” published in Psychological Bulletin. Continue reading ‘Use and Misuse of Chi-square’ »

Tags: chi-square, chi-square statistic, degrees-of-freedom, misuse, use
Category: arXiv, Bad AstroStat, Cross-Cultural, Data Processing, Stat | 1 Comment

4754 d.f.

Mar 17th, 2009| 03:37 pm | Posted by hlee

I couldn’t believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I’ve often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom. Continue reading ‘4754 d.f.’ »

Tags: Binning, chi-square, chi-square minimization, chi-square optimization, chi-square statistic, class, degrees-of-freedom, equiprobable, goodness-of-fit test, kernel density estimation
Category: Bad AstroStat, Fitting, High-Energy, Methods, Spectral, X-ray | 2 Comments

Correlation is not causation

Mar 6th, 2009| 09:22 am | Posted by vlk

What XKCD says:

The mouseover text on the original says “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.”

It is a bad habit, hard to break, the temptation is great.

Tags: causation, comics, correlation, XKCD
Category: Bad AstroStat, Misc, Quotes | 1 Comment

Borel Cantelli Lemma for the Gaussian World

Dec 3rd, 2008| 12:31 am | Posted by hlee

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

Continue reading ‘Borel Cantelli Lemma for the Gaussian World’ »

It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[↩]
To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[↩]

Tags: Borel Cantelli Lemma, CLT, families of distributions, gaussian, grand challenge, measure, non-Gaussian, probability, statisticians
Category: arXiv, Astro, Bad AstroStat, Cross-Cultural, Frequentist, Jargon, News, Quotes, Stat, Uncertainty | Comment

after “Thanks to Henrietta Leavitt”

Nov 6th, 2008| 11:22 pm | Posted by hlee

Personally, it was a highly anticipated symposium at CfA because I was fascinated about the female computers’ (or astronomers’) contributions that occurred here about a century ago even though at that time women were not considered as scientists but mere assistants for tedious jobs. Continue reading ‘after “Thanks to Henrietta Leavitt”’ »

Category: Astro, Bad AstroStat, Data Processing, Methods, Stars, Stat | Comment

[Q] Objectivity and Frequentist Statistics

Sep 29th, 2008| 02:15 am | Posted by vlk

Is there an objective method to combine measurements of the same quantity obtained with different instruments?

Suppose you have a set of N₁ measurements obtained with one detector, and another set of N₂ measurements obtained with a second detector. And let’s say you wanted something as simple as an estimate of the mean of the quantity (say the intensity) being measured. Let us further stipulate that the measurement errors of each of the points is similar in magnitude and neither instrument displays any odd behavior. How does one combine the two datasets without appealing to subjective biases about the reliability or otherwise of the two instruments? Continue reading ‘[Q] Objectivity and Frequentist Statistics’ »

Tags: Bayesian, Frequentist, Ping Zhao, question for statisticians, weighted mean
Category: Bad AstroStat, Fitting, Frequentist, Stat, Uncertainty | 18 Comments

Classification and Clustering

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags: black box, book, catalog, Classification, clustering, haste, outliers, R, Robert Serfling, semi-supervised learning, survey
Category: Algorithms, arXiv, Astro, Bad AstroStat, Cross-Cultural, Data Processing, Frequentist, Jargon, Methods, Stat | Comment

A History of Markov Chain Monte Carlo

Sep 17th, 2008| 02:11 pm | Posted by hlee

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Tags: BUGS, data augmentation, EM, Gibbs sampling, Hasting, history, Metropolis, reversible jump, simulated annealing
Category: Algorithms, arXiv, Bad AstroStat, Bayesian, Cross-Cultural, Data Processing, Imaging, MC, MCMC, Methods, Quotes, Stat | 2 Comments

appealing eyes == powerful method

Sep 12th, 2008| 11:30 pm | Posted by hlee

To claim results are powerful statistically, astronomers highly rely on eyeballing techniques (need apprenticeship to acquire skills but look subjective to me without such training). Some cases, I know actual statistical tests to support or to dissuade those claims. Hence, I believe astronomers are well aware of those statistical tests. I guess they are afraid that those statistics may reject their claims or are not powerful enough in numeric metrics. Instead, they spend efforts to make graphics more appealing. Continue reading ‘appealing eyes == powerful method’ »

Tags: emprical data analysis, eyeballing, powerful test
Category: Bad AstroStat, Cross-Cultural, Data Processing, Jargon | 2 Comments

my first AAS. VI. Normalization

Jun 20th, 2008| 11:58 pm | Posted by hlee

One realization of mine during the meeting was related to a cultural difference; therefore, there is no relation to any presentations during the 212th AAS in this post. Please, correct me if you find wrong statements. I cannot cover all perspectives from both disciplines but I think there are two distinct fashions in practicing normalization. Continue reading ‘my first AAS. VI. Normalization’ »

Tags: AAS, measure, measure theory, normalization, PDF, pmf
Category: Bad AstroStat, Cross-Cultural, Uncertainty | Comment

The LRT is worthless for …

Apr 25th, 2008| 01:48 am | Posted by hlee

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

Tags: LRT, mixture
Category: arXiv, Bad AstroStat, CHASC, Frequentist, Stat | Comment

The Burden of Reviewers

Apr 17th, 2008| 12:17 pm | Posted by chasc

Astronomers write literally thousands of proposals each year to observe their favorite targets with their favorite telescopes. Every proposal must be accompanied by a technical justification, where the proposers demonstrate that their goal is achievable, usually via a simulation. Surprisingly, a large number of these justifications are statistically unsound. Guest Slogger Simon Vaughan describes the problem and shows what you can do to make reviewers happy (and you definitely want to keep reviewers happy).
Continue reading ‘The Burden of Reviewers’ »

Tags: proposals, reviewers, Simon Vaughan, Type II
Category: Astro, Bad AstroStat, Uncertainty | 11 Comments

Eddington versus Malmquist

Mar 13th, 2008| 01:53 pm | Posted by vlk

During the runup to his recent talk on logN-logS, Andreas mentioned how sometimes people are confused about the variety of statistical biases that afflict surveys. They usually know what the biases are, but often tend to mislabel them, especially the Eddington and Malmquist types. Sort of like using “your” and “you’re” interchangeably, which to me is like nails on a blackboard. So here’s a brief summary: Continue reading ‘Eddington versus Malmquist’ »

Tags: bias, detection threshold, Eddington, faint source fluctuations, logN-logS, luminosity function, Malmquist
Category: Astro, Bad AstroStat, Jargon | 3 Comments