The AstroStat Slog

Archive for the ‘arXiv’ Category.

An excerpt from …

Feb 26th, 2009| 04:07 pm | Posted by hlee

I’ve been complaining about how one can do machine learning on solar images without a training set? (see my comment at the big picture). On the other hand, I’m also aware of challenges in astronomy that data (images) cannot be transformed freely and be fed into standard machine learning algorithms. Tailoring data pipelining, cleaning, and processing to currently existing vision algorithms may not be achievable. The hope of automatizing the detection/identification procedure of interesting features (e.g. flares and loops) and forecasting events on the surface of the Sun is only a dream. Even though the level of image data stream is that of tsunami, we might have to depend on human eyes to comb out interesting features on the Sun until the new paradigm of automatized feature identification algorithms based on a single image i.e. without a training set. The good news is that human eyes have done a superb job! Continue reading ‘An excerpt from …’ »

Tags: brains, computer vision, human eyes, Kendall, machine learning, shape theory, Sun, tsunami
Category: arXiv, Astro, Cross-Cultural, Data Processing, Imaging, Quotes | Comment

[ArXiv] Particle Physics

Feb 20th, 2009| 07:48 pm | Posted by hlee

[stat.AP:0811.1663]
Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription. Continue reading ‘[ArXiv] Particle Physics’ »

Tags: chi-square, chi-square minimization, coverage, hypothesis testing, L.Lyons, LHC, LRT, particle physics, posterior distribution
Category: arXiv, Bayesian, Cross-Cultural, Data Processing, Frequentist, High-Energy, Methods, Physics, Stat | Comment

Guinness, Gosset, Fisher, and Small Samples

Feb 12th, 2009| 02:03 pm | Posted by hlee

Student’s t-distribution is somewhat underrepresented in the astronomical community. Having an article with nice stories, it looks to me the best way to introduce the t distribution. This article describing historic anecdotes about monumental statistical developments occurred about 100 years ago.

Guinness, Gosset, Fisher, and Small Samples by Joan Fisher Box
Source: Statist. Sci. Volume 2, Number 1 (1987), 45-52.

No time for reading the whole article? I hope you have a few minutes to read following quotes, which are quite enchanting to me. Continue reading ‘Guinness, Gosset, Fisher, and Small Samples’ »

Tags: distribution, error, Gosset, guinness, history, sampling distribution, small sample, student t
Category: arXiv, Fitting, Frequentist, Quotes, Stat | Comment

[ArXiv] Special Issue from Annals of Applied Statistics

Feb 9th, 2009| 06:02 am | Posted by hlee

When I was studying astronomy, during when I once became a subject for a social science survey study about life in a department where gender bias is extreme (I was only female), people often asked me how to forecast weather or how to predict future (boys often get questions related to becoming astronauts in addition to weather men and astrologists). Relating astronomy to earth science still happens. Statisticians that I met at conferences, often tried to associate my efforts on astronomical data with those of geologists and meteorologists, who often use stochastic models and spatial temporal models, dimensional extensions of models in time series. Because of this confusion between astronomy and meteorology/geology/oceanology, and the longer history of wide statistical applications found from the latter subjects (a good counter example is the least square method by Gauss but I cannot think more examples to contradict my statement that statistics is used widely among earth scientists with rich history), from time to time my attention has been paid to various applications and models in those subjects so as to find a thread for similar applications for astronomy. Although I don’t like the misconception of astronomy equal to meteorology or geoscience, those scientific fields, what so ever, share at least one commonality that statistical methods are applied to analyzing satellite data. Continue reading ‘[ArXiv] Special Issue from Annals of Applied Statistics’ »

Tags: AoAS, application, geoscience, meteorology, modeling, SOM, stochastic model
Category: arXiv, Cross-Cultural, Stat | 1 Comment

Likelihood Ratio Technique

Jan 15th, 2009| 06:01 pm | Posted by hlee

I wonder what Fisher, Neyman, and Pearson would say if they see “Technique” after “Likelihood Ratio” instead of “Test.” A presenter’s saying “Likelihood Ratio Technique” for source identification, I couldn’t resist checking it out not to offend founding fathers of the likelihood principle in statistics since “Technique” sounded derogatory to be attached with “Likelihood” to my ears. I thank, above all, the speaker who kindly gave me the reference about this likelihood ratio technique. Continue reading ‘Likelihood Ratio Technique’ »

Tags: Fisher, likelihood principle, likelihood ratio technique, likelihood ratio test, Neyman, Pearson
Category: Algorithms, arXiv, Astro, Bayesian, Cross-Cultural, Data Processing, Fitting, Frequentist, Jargon, Methods, Objects, Stat, Uncertainty | Comment

[MADS] HMM

Dec 7th, 2008| 11:23 pm | Posted by hlee

MADS stands for “Missing in ADS.” Every astronomer, I believe, knows what ADS is. As we have [EotW] series and used to have [ArXiv] series, creating a new series for semi-periodic postings under the well known name ADS seems interesting. Continue reading ‘[MADS] HMM’ »

Tags: ADS, hidden markove model, HMM, image processing, MADS, search engine, signal processing
Category: arXiv, Cross-Cultural, Jargon, Misc | 6 Comments

Borel Cantelli Lemma for the Gaussian World

Dec 3rd, 2008| 12:31 am | Posted by hlee

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.^[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.^[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

Continue reading ‘Borel Cantelli Lemma for the Gaussian World’ »

It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[↩]
To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[↩]

Tags: Borel Cantelli Lemma, CLT, families of distributions, gaussian, grand challenge, measure, non-Gaussian, probability, statisticians
Category: arXiv, Astro, Bad AstroStat, Cross-Cultural, Frequentist, Jargon, News, Quotes, Stat, Uncertainty | Comment

[SPS] Testing Completeness

Nov 19th, 2008| 01:34 am | Posted by hlee

There will be a special session at the 213th AAS meeting on meaning from surveys and population studies (SPS). Until then, it might be useful to pull out some interesting and relevant papers and questions/challenges as a preliminary to the meeting. I will not list astronomical catalogs and surveys only, which are literally countless these days but will bring out some if they change the way how science is performed with a description of the catalog (the best example would be SDSS, Sloan Digital Sky Survey, to my knowledge). Continue reading ‘[SPS] Testing Completeness’ »

Tags: completeness, incompleteness, normal, SPS, test
Category: arXiv, Astro, Cross-Cultural, Frequentist, Methods, News, Quotes, Stat, Uncertainty | Comment

[tutorial] multispectral imaging, a case study

Oct 9th, 2008| 04:28 pm | Posted by hlee

Without signal processing courses, the following equation should be awfully familiar to astronomers of photometry and handling data:
$$c_k=\int_\Lambda l(\lambda) r(\lambda) f_k(\lambda) \alpha(\lambda) d\lambda +n_k$$
Terms are in order, camera response (c_k), light source (l), spectral radiance by l (r), filter (f), sensitivity (α), and noise (n_k), where Λ indicates the range of the spectrum in which the camera is sensitive.
Or simplified to $$c_k=\int_\Lambda \phi_k (\lambda) r(\lambda) d\lambda +n_k$$
where φ denotes the combined illuminant and the spectral sensitivity of the k-th channel, which goes by augmented spectral sensitivity. Well, we can skip spectral radiance r, though. Unfortunately, the sensitivity α has multiple layers, not a simple closed function of λ in astronomical photometry.
Or $$c_k=\Theta r +n$$
Inverting Θ and finding a reconstruction operator such that r=inv(Θ)c_k leads spectral reconstruction although Θ is, in general, not a square matrix. Otherwise, approach from indirect reconstruction. Continue reading ‘[tutorial] multispectral imaging, a case study’ »

Tags: matrix, Mona Lisa, multispectral, noise, signal processing, signal processing magazine, Tutorial
Category: Algorithms, arXiv, Cross-Cultural, Data Processing, Fitting, Imaging, Methods, Quotes, Spectral, Stat, Uncertainty | 2 Comments

Classification and Clustering

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags: black box, book, catalog, Classification, clustering, haste, outliers, R, Robert Serfling, semi-supervised learning, survey
Category: Algorithms, arXiv, Astro, Bad AstroStat, Cross-Cultural, Data Processing, Frequentist, Jargon, Methods, Stat | Comment

A History of Markov Chain Monte Carlo

Sep 17th, 2008| 02:11 pm | Posted by hlee

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Tags: BUGS, data augmentation, EM, Gibbs sampling, Hasting, history, Metropolis, reversible jump, simulated annealing
Category: Algorithms, arXiv, Bad AstroStat, Bayesian, Cross-Cultural, Data Processing, Imaging, MC, MCMC, Methods, Quotes, Stat | 2 Comments

Why Gaussianity?

Sep 10th, 2008| 10:15 am | Posted by hlee

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

Tags: CLT, Gaussianity, Henry Poincare, IEEE, normal, signal processing, signal processing magazine, Why
Category: arXiv, Cross-Cultural, Data Processing, Fitting, Frequentist, Methods, Physics, Quotes, Stat, Uncertainty | Comment

A Conversation with Peter Huber

Sep 5th, 2008| 08:46 pm | Posted by hlee

The problem with data analysis is of course that it is a performing art. It is not something you easily write a paper on; rather, it is something you do. And so it is difficult to publish.

quoted from this conversation Continue reading ‘A Conversation with Peter Huber’ »

Tags: art, Babilonian astronomy, computers, computing, computing history, conversation, FFT, history, Peter Huber, project pursuit, robust statistics, robustness
Category: Algorithms, arXiv, Cross-Cultural, Data Processing, Jargon, Languages, Quotes | Comment

A lecture note of great utility

Aug 27th, 2008| 02:35 pm | Posted by hlee

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract: Continue reading ‘A lecture note of great utility’ »

Tags: Bayes Theorem, Boltzmann, Carnot, Entropy, Gibbs paradox, Information, laws of thermodynamics, lecture note, maximum likelihood, probability, Shannon, statistical physics, Tchebyshev inequality, thermodynamics
Category: arXiv, Bayesian, Cross-Cultural, Data Processing, Fitting, Physics, Stat | Comment

Survival Analysis: A Primer

Jul 8th, 2008| 07:27 pm | Posted by hlee

Astronomers confront with various censored and truncated data. Often these types of data are called after famous scientists who generalized them, like Eddington bias. When these censored or truncated data become the subject of study in statistics, instead of naming them, statisticians try to model them so that the uncertainty can be quantified. This area is called survival analysis. If your library has The American Statistician subscription and you are an astronomer handles censored or truncated data sets, this primer would be useful for briefly conceptualizing statistics jargon in survival analysis and for characterizing uncertainties residing in your data. Continue reading ‘Survival Analysis: A Primer’ »

Tags: censored, Efron, Feigelson, Freedman, massive data, Nelson, Petrosian, survival analysis, truncated
Category: arXiv, Fitting, Stat | 4 Comments