The AstroStat Slog

Archive for the ‘Jargon’ Category.

Tricki

Apr 16th, 2009| 07:32 am | Posted by vlk

http://www.tricki.org/

The wikipedia-like repository for mathematical “tricks” has now gone live. Their mission statement:

The main body of the Tricki will be a (large, if all goes according to plan) collection of articles about methods for solving mathematical problems. These will be everything from very general problem-solving tips such as, “If you can’t solve the problem, then try to invent an easier problem that sheds light on it,” to much more specific advice such as, “If you want to solve a linear differential equation, you can convert it into a polynomial equation by taking the Fourier transform.”

Tags: Announcement, math, mathematics, maths, tricki, tricks, website, Wikipedia
Category: Algorithms, Cross-Cultural, Jargon, News, Stat | Comment

Poisson vs Gaussian, Part 2

Apr 10th, 2009| 03:16 pm | Posted by vlk

Probability density functions are another way of summarizing the consequences of assuming a Gaussian error distribution when the true distribution is Poisson. We can compute the posterior probability of the intensity of a source, when some number of counts are observed in a source region, and the background is estimated using counts observed in a different region. We can then compare it to the equivalent Gaussian.

The figure below (AAS 472.09) compares the pdfs for the Poisson intensity (red curves) and the Gaussian equivalent (black curves) for two cases: when the number of counts in the source region is 50 (top) and 8 (bottom) respectively. In both cases a background of 200 counts collected in an area 40x the source area is used. The hatched region represents the 68% equal-tailed interval for the Poisson case, and the solid horizontal line is the ±1σ width of the equivalent Gaussian.

Clearly, for small counts, the support of the Poisson distribution is bounded below at zero, but that of the Gaussian is not. This introduces a visibly large bias in the interval coverage as well as in the normalization properties. Even at high counts, the Poisson is skewed such that larger values are slightly more likely to occur by chance than in the Gaussian case. This skew can be quite critical for marginal results. Continue reading ‘Poisson vs Gaussian, Part 2’ »

Tags: background, gaussian, PDF, Poisson
Category: Jargon, Stat | 4 Comments

Poisson vs Gaussian

Apr 9th, 2009| 07:01 pm | Posted by vlk

We astronomers are rather fond of approximating our counting statistics with Gaussian error distributions, and a lot of ink has been spilled justifying and/or denigrating this habit. But just how bad is the approximation anyway?

I ran a simple Monte Carlo based test to compute the expected bias between a Poisson sample and the “equivalent” Gaussian sample. The result is shown in the plot below.

The jagged red line is the fractional expected bias relative to the true intensity. The typical recommendation in high-energy astronomy is to bin up events until there are about 25 or so counts per bin. This leads to an average bias of about 2% in the estimate of the true intensity. The bias drops below 1% for counts >50. Continue reading ‘Poisson vs Gaussian’ »

Tags: bias, gaussian, IDL, Poisson
Category: Jargon, Stat | 2 Comments

[MADS] Chernoff face

Apr 2nd, 2009| 12:00 pm | Posted by hlee

I cannot remember when I first met Chernoff face but it hooked me up instantly. I always hoped for confronting multivariate data from astronomy applicable to this charming EDA method. Then, somewhat such eager faded, without realizing what’s happening. Tragically, this was mainly due to my absent mind. Continue reading ‘[MADS] Chernoff face’ »

Tags: calibration, Capella, Chandra, Chernoff face, EDA, line ratios, MADS, XAtlas
Category: Algorithms, arXiv, Astro, Cross-Cultural, Data Processing, Jargon, Methods, Misc, News, Quotes, Spectral, Stars, X-ray | 2 Comments

[Book] Elements of Information Theory

Mar 11th, 2009| 01:04 pm | Posted by hlee

by T. Cover and J. Thomas website: http://www.elementsofinformationtheory.com/

Once, perhaps more, I mentioned this book in my post with the most celebrated paper by Shannon (see the posting). Some additional recommendation of the book has been made to answer offline inquiries. And this book always has been in my favorite book list that I like to use for teaching. So, I’m not shy with recommending this book to astronomers with modern objective perspectives and practicality. Before advancing for more praises, I must say that those admiring words do not imply that I understand every line and problem of the book. Like many fields, Information theory has grown fast since the monumental debut paper by Shannon (1948) like the speed of astronomers observation techniques. Without the contents of this book, most of which came after Shannon (1948), internet, wireless communication, compression, etc could not have been conceived. Since the notion of “entropy“, the core of information theory, is familiar to astronomers (physicists), the book would be received better among them than statisticians. This book should be read easier to astronomers than statisticians. Continue reading ‘[Book] Elements of Information Theory’ »

Tags: bandwidth, book, Cover, data mining, education, Entropy, Information theory, Kolmogorov complexity, Shannon, Thomas
Category: Algorithms, arXiv, Cross-Cultural, Data Processing, Jargon, Quotes | Comment

[MADS] Mahalanobis distance

Mar 9th, 2009| 05:18 pm | Posted by hlee

It bears the name of its inventor, Prasanta Chandra Mahalanobis. As opposed to the Euclidean distance, a household name, the name of this distance is rarely used but many pseudonyms exist with variations adapted into broad scientific disciplines and applications. Therefore, under different names, I believe that the Mahalanobis distance is frequently applied in exploring and analyzing astronomical data. Continue reading ‘[MADS] Mahalanobis distance’ »

Tags: LDA, MADS, Mahalanobis, Mahalanobis distance, multivariate, outliers, PCA
Category: arXiv, Cross-Cultural, Jargon, Stat | 6 Comments

systematic errors

Mar 6th, 2009| 03:42 pm | Posted by hlee

Ah ha~ Once I questioned, “what is systematic error?” (see [Q] systematic error.) Thanks to L. Lyons’ work discussed in [ArXiv] Particle Physics, I found this paper, titled Systematic Errors describing the concept and statistical inference related to systematic errors in the field of particle physics. It, gladly, shares lots of similarity with high energy astrophysics. Continue reading ‘systematic errors’ »

Tags: coverage, Heinrich, likelihood, Lyons, nuisance parameter, objective priors, p-value, particle physics, statistical error, subjective priors, systematic error
Category: Algorithms, arXiv, Bayesian, Cross-Cultural, Data Processing, Frequentist, Jargon, Misc, News, Physics, Stat, Uncertainty | Comment

[MADS] Semiparametric

Feb 9th, 2009| 03:16 pm | Posted by hlee

There were (only) four articles from ADS whose abstracts contain the word semiparametric (none in titles). Therefore, semiparametric is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term semiparametric to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »

Tags: MADS, MLE, nonparametric, semiparametric, tessellation, voronoi tessellation
Category: Cross-Cultural, Jargon, Methods, Stat | Comment

accessing data, easier than before but…

Jan 20th, 2009| 01:59 pm | Posted by hlee

Someone emailed me for globular cluster data sets I used in a proceeding paper, which was about how to determine the multi-modality (multiple populations) based on well known and new information criteria without binning the luminosity functions. I spent quite time to understand the data sets with suspicious numbers of globular cluster populations. On the other hand, obtaining globular cluster data sets was easy because of available data archives such as VizieR. Most data sets in charts/tables, I acquire those data from VizieR. In order to understand science behind those data sets, I check ADS. Well, actually it happens the other way around: check scientific background first to assess whether there is room for statistics, then search for available data sets. Continue reading ‘accessing data, easier than before but…’ »

Tags: archive, ascii, catalog, CDA, data analysis, data mining, database, Gator, globular cluster, inference, massive data, multimodality, multiple populations, NED, SDSS, statistical inference, statistician, streaming data, table, tabulated, visieR
Category: Algorithms, Astro, Cross-Cultural, Data Processing, Jargon, Meta, Nuggets, Objects | 3 Comments

Likelihood Ratio Technique

Jan 15th, 2009| 06:01 pm | Posted by hlee

I wonder what Fisher, Neyman, and Pearson would say if they see “Technique” after “Likelihood Ratio” instead of “Test.” A presenter’s saying “Likelihood Ratio Technique” for source identification, I couldn’t resist checking it out not to offend founding fathers of the likelihood principle in statistics since “Technique” sounded derogatory to be attached with “Likelihood” to my ears. I thank, above all, the speaker who kindly gave me the reference about this likelihood ratio technique. Continue reading ‘Likelihood Ratio Technique’ »

Tags: Fisher, likelihood principle, likelihood ratio technique, likelihood ratio test, Neyman, Pearson
Category: Algorithms, arXiv, Astro, Bayesian, Cross-Cultural, Data Processing, Fitting, Frequentist, Jargon, Methods, Objects, Stat, Uncertainty | Comment

Lost in Translation: Measurement Error

Jan 2nd, 2009| 11:24 pm | Posted by vlk

You would think that something like “measurement error” is a well-defined concept, and everyone knows what it means. Not so. I have so far counted at least 3 different interpretations of what it means.

Suppose you have measurements X={X_i, i=1..N} of a quantity whose true value is, say, X₀. One can then compute the mean and standard deviation of the measurements, E(X) and σ_X. One can also infer the value of a parameter θ(X), derive the posterior probability density p(θ|X), and obtain confidence intervals on it.

So here are the different interpretations:

Measurement error is σ_X, or the spread in the measurements. Astronomers tend to use the term in this manner.
Measurement error is X₀-E(X), or the “error made when you make the measurement”, essentially what is left over beyond mere statistical variations. This is how statisticians seem to use it, essentially the bias term. To quote David van Dyk

For us it is just English. If your measurement is different from the real value. So this is not the Poisson variability of the source for effects or ARF, RMF, etc. It would disappear if you had a perfect measuring device (e.g., telescope).
Measurement error is the width of p(θ|X), i.e., the measurement error of the first type propagated through the analysis. Astronomers use this too to refer to measurement error.

Who am I to say which is right? But be aware of who you may be speaking with and be sure to clarify what you mean when you use the term!

Tags: bias, error, error propagation, measurement error, systematic error
Category: Astro, Cross-Cultural, Jargon, Stat, Uncertainty | 8 Comments

[MADS] multiscale modeling

Dec 11th, 2008| 03:46 pm | Posted by hlee

A few scientists in our group work on estimating the intensities of gamma ray observations from sky surveys. This work distinguishes from typical image processing which mostly concerns the point estimation of intensity at each pixel location and the size of overall white noise type error. Often times you will notice from image processing that the orthogonality between errors and sources, and the white noise assumptions. These assumptions are typical features in image processing utilities and modules. On the other hand, CHASC scientists relate more general and broad statistical inference problems in estimating the intensity map, like intensity uncertainties at each point and the scientifically informative display of the intensity map with uncertainty according to the Poisson count model and constraints from physics and the instrument, where the field, multiscale modeling is associated. Continue reading ‘[MADS] multiscale modeling’ »

Tags: MADS, Markov random field, model, multiscale, multiscale modeling
Category: Algorithms, CHASC, Cross-Cultural, Jargon | 1 Comment

[MADS] HMM

Dec 7th, 2008| 11:23 pm | Posted by hlee

MADS stands for “Missing in ADS.” Every astronomer, I believe, knows what ADS is. As we have [EotW] series and used to have [ArXiv] series, creating a new series for semi-periodic postings under the well known name ADS seems interesting. Continue reading ‘[MADS] HMM’ »

Tags: ADS, hidden markove model, HMM, image processing, MADS, search engine, signal processing
Category: arXiv, Cross-Cultural, Jargon, Misc | 6 Comments

Borel Cantelli Lemma for the Gaussian World

Dec 3rd, 2008| 12:31 am | Posted by hlee

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

Continue reading ‘Borel Cantelli Lemma for the Gaussian World’ »

It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[↩]
To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[↩]

Tags: Borel Cantelli Lemma, CLT, families of distributions, gaussian, grand challenge, measure, non-Gaussian, probability, statisticians
Category: arXiv, Astro, Bad AstroStat, Cross-Cultural, Frequentist, Jargon, News, Quotes, Stat, Uncertainty | Comment

It bothers me.

Nov 17th, 2008| 01:39 pm | Posted by hlee

The full description is given http://cxc.harvard.edu/ciao3.4/ahelp/bayes.html about “bayes” under sherpa/ciao^[1]. Some sentences kept bothering me and here’s my account for the reason given outside of quotes. Continue reading ‘It bothers me.’ »

Note that the current sherpa is beta under ciao 4.0 not under ciao 3.4 and a description about “bayes” from the most recent sherpa is not available yet, which means this post needs updates one new release is available[↩]

Tags: bayes, ciao, ML, Sherpa
Category: Algorithms, Astro, Cross-Cultural, Data Processing, Fitting, High-Energy, Jargon, Languages, Methods, Spectral, Uncertainty, X-ray | 4 Comments