The AstroStat Slog

Archive for the ‘Data Processing’ Category.

missing data

Oct 27th, 2008| 09:24 am | Posted by hlee

The notions of missing data are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I’m not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about imputation in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about incompleteness within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes. Continue reading ‘missing data’ »

Tags: bootstrap, catalog, Efron, estimator, ignorable, imputation, incompleteness, Little, MAR, MCAR, missing data, nonparametric, Rubin, Schafer, survey
Category: Astro, Cross-Cultural, Data Processing, Stat | 2 Comments

GSL – GNU Scientific Library

Oct 23rd, 2008| 12:13 pm | Posted by hlee

I’ve talked about IMSL on my pyIMSL post, which is a commercial scientific library. There is a GNU version of IMSL, GSL. Finding GSL is the courtesy of Jiangang, who was the author of the poster that I most liked from the 212th AAS, (see My first AAS. V. measurement error and EM and his comment.) Continue reading ‘GSL – GNU Scientific Library’ »

Tags: C/C++, GNU, GSL, IMSL, Jiangang
Category: Algorithms, Data Processing, Fitting, Stat, Uncertainty | 5 Comments

[tutorial] multispectral imaging, a case study

Oct 9th, 2008| 04:28 pm | Posted by hlee

Without signal processing courses, the following equation should be awfully familiar to astronomers of photometry and handling data:
$$c_k=\int_\Lambda l(\lambda) r(\lambda) f_k(\lambda) \alpha(\lambda) d\lambda +n_k$$
Terms are in order, camera response (c_k), light source (l), spectral radiance by l (r), filter (f), sensitivity (α), and noise (n_k), where Λ indicates the range of the spectrum in which the camera is sensitive.
Or simplified to $$c_k=\int_\Lambda \phi_k (\lambda) r(\lambda) d\lambda +n_k$$
where φ denotes the combined illuminant and the spectral sensitivity of the k-th channel, which goes by augmented spectral sensitivity. Well, we can skip spectral radiance r, though. Unfortunately, the sensitivity α has multiple layers, not a simple closed function of λ in astronomical photometry.
Or $$c_k=\Theta r +n$$
Inverting Θ and finding a reconstruction operator such that r=inv(Θ)c_k leads spectral reconstruction although Θ is, in general, not a square matrix. Otherwise, approach from indirect reconstruction. Continue reading ‘[tutorial] multispectral imaging, a case study’ »

Tags: matrix, Mona Lisa, multispectral, noise, signal processing, signal processing magazine, Tutorial
Category: Algorithms, arXiv, Cross-Cultural, Data Processing, Fitting, Imaging, Methods, Quotes, Spectral, Stat, Uncertainty | 2 Comments

survey and design of experiments

Oct 1st, 2008| 04:16 pm | Posted by hlee

People of experience would say very differently and wisely against what I’m going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics. Continue reading ‘survey and design of experiments’ »

Tags: 213, AAS, Alanna Connors, catalog, census, detection, experimental design, Long Beach, special session, SPS, survey
Category: Astro, CHASC, Cross-Cultural, Data Processing, Jargon, Methods, Misc, News, Stat | 3 Comments

Make3D

Sep 30th, 2008| 01:45 am | Posted by hlee

At least two images for reconstructing a 3D scene is a conventional belief. Yet, we do know that our eyes reconstruct 3D scenes from various single snap shot images, just with one picture. Based on our perception and learning ability or our internal pattern recognition ability, a few groups of people have been trying to reconstruct a 3D image from one still image picture. Luckily you can test such progress, reconstructing a 3D scene from a single still image at Make3D (a click brings you to Make3D at Stanford). Continue reading ‘Make3D’ »

Tags: make3D, solar images, virtual
Category: Algorithms, Data Processing, Imaging, Misc | 3 Comments

Classification and Clustering

Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags: black box, book, catalog, Classification, clustering, haste, outliers, R, Robert Serfling, semi-supervised learning, survey
Category: Algorithms, arXiv, Astro, Bad AstroStat, Cross-Cultural, Data Processing, Frequentist, Jargon, Methods, Stat | Comment

A History of Markov Chain Monte Carlo

Sep 17th, 2008| 02:11 pm | Posted by hlee

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »

Tags: BUGS, data augmentation, EM, Gibbs sampling, Hasting, history, Metropolis, reversible jump, simulated annealing
Category: Algorithms, arXiv, Bad AstroStat, Bayesian, Cross-Cultural, Data Processing, Imaging, MC, MCMC, Methods, Quotes, Stat | 2 Comments

BUGS

Sep 16th, 2008| 04:34 pm | Posted by hlee

Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, BUGS stands for Bayesian inference Using Gibbs Sampling. Continue reading ‘BUGS’ »

Tags: openBUGS, PyBUGS, Python, R, toolbox, winBUGS
Category: Algorithms, Bayesian, Data Processing, Languages, MCMC, Methods, News | Comment

[Book] pattern recognition and machine learning

Sep 16th, 2008| 03:20 pm | Posted by hlee

A nice book by Christopher Bishop.
While I was reading abstracts and papers from astro-ph, I saw many applications of algorithms from pattern recognition and machine learning (PRML). The frequency will increase as large scale survey projects numerate, where recommending a good textbook or a reference in the field seems timely. Continue reading ‘[Book] pattern recognition and machine learning’ »

Tags: Bishop, catalog, machine learning, pattern recognition, PCML, SPS, survey
Category: Algorithms, Astro, Cross-Cultural, Data Processing, Jargon | Comment

appealing eyes == powerful method

Sep 12th, 2008| 11:30 pm | Posted by hlee

To claim results are powerful statistically, astronomers highly rely on eyeballing techniques (need apprenticeship to acquire skills but look subjective to me without such training). Some cases, I know actual statistical tests to support or to dissuade those claims. Hence, I believe astronomers are well aware of those statistical tests. I guess they are afraid that those statistics may reject their claims or are not powerful enough in numeric metrics. Instead, they spend efforts to make graphics more appealing. Continue reading ‘appealing eyes == powerful method’ »

Tags: emprical data analysis, eyeballing, powerful test
Category: Bad AstroStat, Cross-Cultural, Data Processing, Jargon | 2 Comments

Why Gaussianity?

Sep 10th, 2008| 10:15 am | Posted by hlee

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

Tags: CLT, Gaussianity, Henry Poincare, IEEE, normal, signal processing, signal processing magazine, Why
Category: arXiv, Cross-Cultural, Data Processing, Fitting, Frequentist, Methods, Physics, Quotes, Stat, Uncertainty | Comment

A Conversation with Peter Huber

Sep 5th, 2008| 08:46 pm | Posted by hlee

The problem with data analysis is of course that it is a performing art. It is not something you easily write a paper on; rather, it is something you do. And so it is difficult to publish.

quoted from this conversation Continue reading ‘A Conversation with Peter Huber’ »

Tags: art, Babilonian astronomy, computers, computing, computing history, conversation, FFT, history, Peter Huber, project pursuit, robust statistics, robustness
Category: Algorithms, arXiv, Cross-Cultural, Data Processing, Jargon, Languages, Quotes | Comment

NR, the 3rd edition

Aug 28th, 2008| 08:44 pm | Posted by hlee

Talking about limits in Numerical Recipes in my PyIMSL post, I couldn’t resist checking materials, particularly updates in the new edition of Numerical Recipes by Press, et al. (2007). Continue reading ‘NR, the 3rd edition’ »

Tags: book, computing, methods and techniques, new edition, Numerical Recipes
Category: Algorithms, Data Processing, Languages, Methods | Comment

A lecture note of great utility

Aug 27th, 2008| 02:35 pm | Posted by hlee

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract: Continue reading ‘A lecture note of great utility’ »

Tags: Bayes Theorem, Boltzmann, Carnot, Entropy, Gibbs paradox, Information, laws of thermodynamics, lecture note, maximum likelihood, probability, Shannon, statistical physics, Tchebyshev inequality, thermodynamics
Category: arXiv, Bayesian, Cross-Cultural, Data Processing, Fitting, Physics, Stat | Comment

Background Subtraction, the Sequel [Eqn]

Aug 6th, 2008| 01:00 pm | Posted by vlk

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data. Continue reading ‘Background Subtraction, the Sequel [Eqn]’ »

Tags: background, background marginalization, background subtraction, EotW, Equation, Equation of the Week
Category: Astro, Bayesian, Data Processing, High-Energy, Imaging, Jargon, Stat | Comment