Archive for the ‘Data Processing’ Category.

missing data

The notions of missing data are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I’m not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about imputation in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about incompleteness within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes. Continue reading ‘missing data’ »

GSL – GNU Scientific Library

I’ve talked about IMSL on my pyIMSL post, which is a commercial scientific library. There is a GNU version of IMSL, GSL. Finding GSL is the courtesy of Jiangang, who was the author of the poster that I most liked from the 212th AAS, (see My first AAS. V. measurement error and EM and his comment.) Continue reading ‘GSL – GNU Scientific Library’ »

[tutorial] multispectral imaging, a case study

Without signal processing courses, the following equation should be awfully familiar to astronomers of photometry and handling data:
$$c_k=\int_\Lambda l(\lambda) r(\lambda) f_k(\lambda) \alpha(\lambda) d\lambda +n_k$$
Terms are in order, camera response (c_k), light source (l), spectral radiance by l (r), filter (f), sensitivity (α), and noise (n_k), where Λ indicates the range of the spectrum in which the camera is sensitive.
Or simplified to $$c_k=\int_\Lambda \phi_k (\lambda) r(\lambda) d\lambda +n_k$$
where φ denotes the combined illuminant and the spectral sensitivity of the k-th channel, which goes by augmented spectral sensitivity. Well, we can skip spectral radiance r, though. Unfortunately, the sensitivity α has multiple layers, not a simple closed function of λ in astronomical photometry.
Or $$c_k=\Theta r +n$$
Inverting Θ and finding a reconstruction operator such that r=inv(Θ)c_k leads spectral reconstruction although Θ is, in general, not a square matrix. Otherwise, approach from indirect reconstruction. Continue reading ‘[tutorial] multispectral imaging, a case study’ »

survey and design of experiments

People of experience would say very differently and wisely against what I’m going to discuss now. This post only combines two small cross sections of each branch of two trees, astronomy and statistics. Continue reading ‘survey and design of experiments’ »


At least two images for reconstructing a 3D scene is a conventional belief. Yet, we do know that our eyes reconstruct 3D scenes from various single snap shot images, just with one picture. Based on our perception and learning ability or our internal pattern recognition ability, a few groups of people have been trying to reconstruct a 3D image from one still image picture. Luckily you can test such progress, reconstructing a 3D scene from a single still image at Make3D (a click brings you to Make3D at Stanford). Continue reading ‘Make3D’ »

Classification and Clustering

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

A History of Markov Chain Monte Carlo

I’ve been joking about the astronomers’ fashion in writing Markov chain Monte Carlo (MCMC). Frequently, MCMC was represented by Monte Carlo Markov Chain in astronomical journals. I was curious about the history of this new creation. Overall, I thought it would be worth to learn more about the history of MCMC and this paper was up in arxiv: Continue reading ‘A History of Markov Chain Monte Carlo’ »


Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, BUGS stands for Bayesian inference Using Gibbs Sampling. Continue reading ‘BUGS’ »

[Book] pattern recognition and machine learning

A nice book by Christopher Bishop.
While I was reading abstracts and papers from astro-ph, I saw many applications of algorithms from pattern recognition and machine learning (PRML). The frequency will increase as large scale survey projects numerate, where recommending a good textbook or a reference in the field seems timely. Continue reading ‘[Book] pattern recognition and machine learning’ »

appealing eyes == powerful method

To claim results are powerful statistically, astronomers highly rely on eyeballing techniques (need apprenticeship to acquire skills but look subjective to me without such training). Some cases, I know actual statistical tests to support or to dissuade those claims. Hence, I believe astronomers are well aware of those statistical tests. I guess they are afraid that those statistics may reject their claims or are not powerful enough in numeric metrics. Instead, they spend efforts to make graphics more appealing. Continue reading ‘appealing eyes == powerful method’ »

Why Gaussianity?

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

A Conversation with Peter Huber

The problem with data analysis is of course that it is a performing art. It is not something you easily write a paper on; rather, it is something you do. And so it is difficult to publish.

quoted from this conversation Continue reading ‘A Conversation with Peter Huber’ »

NR, the 3rd edition

Talking about limits in Numerical Recipes in my PyIMSL post, I couldn’t resist checking materials, particularly updates in the new edition of Numerical Recipes by Press, et al. (2007). Continue reading ‘NR, the 3rd edition’ »

A lecture note of great utility

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract: Continue reading ‘A lecture note of great utility’ »

Background Subtraction, the Sequel [Eqn]

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data. Continue reading ‘Background Subtraction, the Sequel [Eqn]’ »