Archive for the ‘Bad AstroStat’ Category.

Nov 13th, 2009| 12:13 pm | Posted by hlee

by **P.I.Good** and **J.W.Hardin**. Publisher’s website

My astronomer neighbor mentioned this book a while ago and quite later I found intriguing quotes. Continue reading ‘Quotes from *Common Errors in Statistics*’ »

Apr 20th, 2009| 09:34 pm | Posted by hlee

I asked a couple of astronomers if they heard the term **plug-in estimator** and none of them gave me a positive answer. Continue reading ‘[MADS] plug-in estimator’ »

Tags:

biased,

breakdown point,

chi-square,

confidence interval,

coverage,

delta chi-square,

estimator,

LAD,

mean,

median,

plug-in,

rmse Category:

Bad AstroStat,

Cross-Cultural,

Data Processing,

Jargon,

Uncertainty |

2 Comments
Mar 31st, 2009| 03:43 pm | Posted by hlee

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

**1**. Lack of independence among the single events or measures

**2**. Small theoretical frequencies

**3**. Neglect of frequencies of non-occurrence

**4**. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)

**5**. Indeterminate theoretical frequencies

**6**. Incorrect or questionable categorizing

**7**. Use of non-frequency data

**8**. Incorrect determination of the number of degrees of freedom

**9**. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From “**Chapter 10: On the Use and Misuse of Chi-square**” by K.L.Delucchi in *A Handbook for Data Analysis in the Behavioral Sciences* (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled “The Use and Misuse of the Chi-square” published in *Psychological Bulletin.* Continue reading ‘Use and Misuse of Chi-square’ »

Mar 17th, 2009| 03:37 pm | Posted by hlee

I couldn’t believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I’ve often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom. Continue reading ‘4754 d.f.’ »

Tags:

Binning,

chi-square,

chi-square minimization,

chi-square optimization,

chi-square statistic,

class,

degrees-of-freedom,

equiprobable,

goodness-of-fit test,

kernel density estimation Category:

Bad AstroStat,

Fitting,

High-Energy,

Methods,

Spectral,

X-ray |

2 Comments
Mar 6th, 2009| 09:22 am | Posted by vlk

What XKCD says:

The mouseover text on the original says *“Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.”*

It is a bad habit, hard to break, the temptation is great.

Dec 3rd, 2008| 12:31 am | Posted by hlee

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^{[1]} This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^{[2]}

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

Continue reading ‘Borel Cantelli Lemma for the Gaussian World’ »

Tags:

Borel Cantelli Lemma,

CLT,

families of distributions,

gaussian,

grand challenge,

measure,

non-Gaussian,

probability,

statisticians Category:

arXiv,

Astro,

Bad AstroStat,

Cross-Cultural,

Frequentist,

Jargon,

News,

Quotes,

Stat,

Uncertainty |

Comment
Nov 6th, 2008| 11:22 pm | Posted by hlee

Personally, it was a highly anticipated symposium at CfA because I was fascinated about the female computers’ (or astronomers’) contributions that occurred here about a century ago even though at that time women were not considered as scientists but mere assistants for tedious jobs. Continue reading ‘after “Thanks to Henrietta Leavitt”’ »

Sep 29th, 2008| 02:15 am | Posted by vlk

Is there an objective method to combine measurements of the same quantity obtained with different instruments?

Suppose you have a set of *N*_{1} measurements obtained with one detector, and another set of *N*_{2} measurements obtained with a second detector. And let’s say you wanted something as simple as an estimate of the mean of the quantity (say the intensity) being measured. Let us further stipulate that the measurement errors of each of the points is similar in magnitude and neither instrument displays any odd behavior. How does one combine the two datasets without appealing to subjective biases about the reliability or otherwise of the two instruments? Continue reading ‘[Q] Objectivity and Frequentist Statistics’ »

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse **classification and clustering** and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a **black box**. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags:

black box,

book,

catalog,

Classification,

clustering,

haste,

outliers,

R,

Robert Serfling,

semi-supervised learning,

survey Category:

Algorithms,

arXiv,

Astro,

Bad AstroStat,

Cross-Cultural,

Data Processing,

Frequentist,

Jargon,

Methods,

Stat |

Comment
Sep 17th, 2008| 02:11 pm | Posted by hlee

I’ve been joking about the astronomers’ fashion in writing **Markov chain Monte Carlo (MCMC)**. Frequently, **MCMC** was represented by **Monte Carlo Markov Chain** in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Tags:

BUGS,

data augmentation,

EM,

Gibbs sampling,

Hasting,

history,

Metropolis,

reversible jump,

simulated annealing Category:

Algorithms,

arXiv,

Bad AstroStat,

Bayesian,

Cross-Cultural,

Data Processing,

Imaging,

MC,

MCMC,

Methods,

Quotes,

Stat |

2 Comments
Sep 12th, 2008| 11:30 pm | Posted by hlee

To claim results are powerful statistically, astronomers highly rely on eyeballing techniques (need apprenticeship to acquire skills but look subjective to me without such training). Some cases, I know actual statistical tests to support or to dissuade those claims. Hence, I believe astronomers are well aware of those statistical tests. I guess they are afraid that those statistics may reject their claims or are not powerful enough in numeric metrics. Instead, they spend efforts to make graphics more appealing. Continue reading ‘appealing eyes == powerful method’ »

Jun 20th, 2008| 11:58 pm | Posted by hlee

One realization of mine during the meeting was related to a cultural difference; therefore, there is no relation to any presentations during the 212th AAS in this post. Please, correct me if you find wrong statements. I cannot cover all perspectives from both disciplines but I think there are two distinct fashions in practicing normalization. Continue reading ‘my first AAS. VI. Normalization’ »

Apr 25th, 2008| 01:48 am | Posted by hlee

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT. Continue reading ‘The LRT is worthless for …’ »

Apr 17th, 2008| 12:17 pm | Posted by chasc

*Astronomers write literally thousands of proposals each year to observe their favorite targets with their favorite telescopes. Every proposal must be accompanied by a technical justification, where the proposers demonstrate that their goal is achievable, usually via a simulation. Surprisingly, a large number of these justifications are statistically unsound. Guest Slogger ***Simon Vaughan** describes the problem and shows what you can do to make reviewers happy (and you definitely want to keep reviewers happy).

Continue reading ‘The Burden of Reviewers’ »

Mar 13th, 2008| 01:53 pm | Posted by vlk

During the runup to his recent talk on logN-logS, Andreas mentioned how sometimes people are confused about the variety of statistical biases that afflict surveys. They usually know what the biases are, but often tend to mislabel them, especially the Eddington and Malmquist types. Sort of like using “your” and “you’re” interchangeably, which to me is like nails on a blackboard. So here’s a brief summary: Continue reading ‘Eddington versus Malmquist’ »