The AstroStat Slog » image processing

[ArXiv] Voronoi Tessellations

hlee — Wed, 28 Oct 2009 14:29:24 +0000

As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various nonparametric methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by computational geometry). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms.

As a part of introducing nonparametric statistics, I wanted to write about applications of computation geometry from the nonparametric 2/3 dimensional density estimation perspective. Also, the following article came along when I just began to collect statistical applications in astronomy (my [ArXiv] series). This [arXiv] paper, in fact, initiated me to investigate Voronoi Tessellations in astronomy in general.

[arxiv/astro-ph:0707.2877]
Voronoi Tessellations and the Cosmic Web: Spatial Patterns and Clustering across the Universe
by Rien van de Weygaert

Since then, quite time has passed. In the mean time, I found more publications in astronomy specifically using tessellation as a main tool of nonparametric density estimation and for data analysis. Nonetheless, in general, topics in spatial statistics tend to be unrecognized or almost ignored in analyzing astronomical spatial data (I mean data points with coordinate information). Many seem only utilizing statistics partially or not at all. Some might want to know how often Voronoi tessellation is applied in astronomy. Here, I listed results from my ADS search by limiting tessellation in title key words. :

[arxiv/astro-ph:0110259]
Detecting Clusters of Galaxies in the Sloan Digital Sky Survey I : Monte Carlo Comparison of Cluster Detection Algorithms
by Kim, R.S.J. et al. (2002) AJ, 123, pp.20-36.
[arxiv/astro-ph:0906.1905]
The VOISE Algorithm: a Versatile Toll for Automatic Segmentation of Astronomical Images
by Guio, P. and Achilleos, N. (2009)
Using Voronoi Techniques to determine the shapes of photon sources
by Wilkinson and Meurs Irish Astronomical Journal, 1998, 25(1), 37
High-order 3D Voronoi tessellation for identifying isolated galaxies, pairs and triplets
by Elyiv, A.; Melnyk, O.; Vavilova, I. 2009..MNRAS..394..1409E
3-D Voronoi’s Tessellation as a Tool for Identifying Galaxy Groups
by Melnyk, Olga V.; Elyiv, Andrii A.; Vavilova, Iryna B. 2007..IAUS..235..223M
Adaptive binning of X-ray data with weighted Voronoi tessellations
by Diehl, Steven; Statler, Thomas S. 2006..MNRAS..368..497D
Adaptive spatial binning of integral-field spectroscopic data using Voronoi tessellations
by Cappellari, M. and Copin, Y. 2003..MNRAS..342..345C
Adaptive Spatial Binning of 2D Spectra and Images Using Voronoi Tessellations
by Cappellari, M.; Copin, Y. 2002..ASPC..282..515CA
Finding galaxy clusters using Voronoi tessellations
by Ramella, M.; Boschin, W.; Fadda, D.; Nonino, M. 2001..A&A…368..776R
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation
by Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru 1999..ApJS..124..1
Cluster Identification via Voronoi Tesselation ..1999..ASPC..176..108
The accuracy of parameters determined with the core-sampling method: Application to Voronoi tessellations 1997..A&AS..123..495
Dynamical Voronoi tessellation. V. Thickness and incompleteness.
by Zaninetti, L 1995..A&AS..109..71
Fragmenting the Universe. 3: The constructions and statistics of 3-D Voronoi tessellations
by van de Weygaert, Rien 1994..A&A..283..361
Dynamical Voronoi tessellation. IV. The distribution of the asteroids
by Zaninetti, L 1993..A&A..276..255
Quasi-periodic structures in the large-scale galaxy distribution and three-dimensional Voronoi tessellation
1991..MNRAS..250..519
Dynamical Voronoi tessellation. III – The distribution of galaxies
by Zaninetti, L 1991..A&A..246..291
Dynamical Voronoi tessellation. II – The three-dimensional case
by Zaninetti, L 1990..A&A..233..293
Dynamical Voronoi tessellation. I – The two-dimensional case
by Zaninetti, L 1989..A&A..224..345

Then, the topic has been forgotten for a while until this recent [arXiv] paper, which reminded me my old intention for introducing tessellation for density estimation and for understanding large scale structures or clusters (astronomers’ jargon, not the term in machine or statistical learning).

[arxiv:stat.ME:0910.1473] Moment Analysis of the Delaunay Tessellation Field Estimator
by M.N.M van Lieshout

Looking into plots of the papers by van de Weygaert or van Lieshout, without mathematical jargon and abstraction, one can immediately understand what Voronoi and Delaunay Tessellation is (Delaunay Tessellation is also called as Delaunay Triangulation (wiki). Perhaps, you want to check out wiki:Delaunay Tessellation Field Estimator as well). Voronoi tessellations have been adopted in many scientific/engineering fields to describe the spatial distribution. Astronomy is not an exception. Voronoi Tessellation has been used for field interpolation.

van de Weygaert described Voronoi tessellations as follows:

the asymptotic frame for the ultimate matter distribution,
the skeleton of the cosmic matter distribution,
a versatile and flexible mathematical model for weblike spatial pattern, and
a natural asymptotic result of an evolution in which low-density expanding void regions dictate the spatial organization of the Megaparsec universe, while matter assembles in high-density filamentary and wall-like interstices between the voids.

van Lieshout derived explicit expressions for the mean and variance of Delaunay Tessellatoin Field Estimator (DTFE) and showed that for stationary Poisson processes, the DTFE is asymptotically unbiased with a variance that is proportional to the square intensity.

We’ve observed voids and filaments of cosmic matters with patterns of which theory hasn’t been discovered. In general, those patterns are manifested via observed galaxies, both directly and indirectly. Individual observed objects, I believe, can be matched to points that construct Voronoi polygons. They represent each polygon and investigating its distributional properly helps to understand the formation rules and theories of those patterns. For that matter, probably, various topics in stochastic geometry, not just Voronoi tessellation, can be adopted.

There are plethora information available on Voronoi Tessellation such as the website of International Symposium on Voronoi Diagrams in Science and Engineering. Two recent meeting websites are ISVD09 and ISVD08. Also, the following review paper is interesting.

Centroidal Voronoi Tessellations: Applications and Algorithms (1999) Du, Faber, and Gunzburger in SIAM Review, vol. 41(4), pp. 637-676

By the way, you may have noticed my preference for Voronoi Tessellation over Delaunay owing to the characteristics of this centroidal Voronoi that each observation is the center of each Voronoi cell as opposed to the property of Delaunay triangulation that multiple simplices are associated one observation/point. However, from the perspective of understanding the distribution of observations as a whole, both approaches offer summaries and insights in a nonparametric fashion, which I put the most value on.

More on Space Weather

hlee — Tue, 22 Sep 2009 17:03:11 +0000

Thanks to a Korean solar physicist^[1] I was able to gather the following websites and some relevant information on Space Weather Forecast in action, not limited to literature nor toy data.

Space Weather Research Lab at NJIT
SEEDS — Solar Eruptive Event Detection System at George Mason University.
CACTUS A software package for ‘Computer Aided CME Tracking
SRON in the Netherlands

These seem quite informative and I believe more statisticians and data scientists (signal and image processing, machine learning, computer vision, and data mining) easily collaborate with solar physicists. All the complexity, as a matter of fact, comes from data processing to be fed in to (machine, statistical) learning algorithms and defining the objectives of learning. Once settled, one can easily apply numerous methods in the field to these time varying solar images.

I’m writing this short posting because I finally found those interesting articles that I collected for my previous post on Space Weather. After finding them and scanning through, I realized that methodology-wise they only made baby steps. You’ll see a limited number key words are repeated although there is a humongous society of scientists and engineers in the knowledge discovery and data mining.

Note that the objectives of these studies are quite similar. They described machine learning for the purpose of automatizing the procedure of detecting features of interest of the Sun and possible forecasting relevant phenomena that affects our own atmosphere due to associated solar activities.

Automated Prediction of CMEs Using Machine Learning of CME – Flare Associations by Qahwaji et al. (2008) in Solar Phy. vol 248, pp.471-483.
Automatic Short-Term Solar Flare Prediction using Machine Learning and Sunspot Associations by Qahwaji and Colak (2007) in Solar Phy. vol. 241, pp. 195-211

Space weather is defined by the U.S. National Space Weather Probram (NSWP) as “conditions on the Sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and can endanger human life or health”

Personally thinking, the section of “jackknife” needs to be replaced with “cross-validation.”
Automatic Detection and Classification of Coronal Mass Ejections by Qu et al. (2006) in Solar Phy. vol. 237, pp.419-431.
Automatic Solar Filament Detection Using image Processing Techniques by Qu et al. (2005) in Solar Phy., vol. 228, pp. 119-135
Automatic Solar Flare Tracking Using Image-Processing Techniques by Qu, et al. (2004) in Solar Phy. vol. 222, pp. 137-149
Automatic Solar Flare Detection Using MLP, RBF, and SVM by Qu et al. (2003) in Solar Phy. vol. 217, pp.157-172. pp. 157-172

I’d like add a survey paper on another type of learning methods beyond Support Vector Machine (SVM) used in almost all articles above. Luckily, this survey paper happened to address my concern about the “practices of background subtraction” in high energy astrophysics.

A Survey of Manifold-Based Learning methods by Huo, Ni, Smith
[Excerpt] What is Manifold-Based Learning?
It is an emerging and promising approach in nonparametric dimension reduction. The article reviewed principle component analysis, multidimensional scaling (MDS), generative topological mapping (GTM), locally linear embedding (LLE), ISOMAP, Laplacian eigenmaps, Hessian eigenmaps, and local tangent space alignment (LTSA) Apart from these revisits and comparison, this survey paper is useful to understand the danger of background subtraction. Homogeneity does not mean constant background to be subtracted, often cause negative source observation.

More collaborations among multiple disciplines are desired in this relatively new field. For me, it is one of the best data and information scientific fields of the 21st century and any progress will be beneficial to human kind.

I must acknowledge him for his kindness and patience. He was my wikipedia to questions while I was studying the Sun.

[Books] Bayesian Computations

hlee — Fri, 11 Sep 2009 20:40:23 +0000

A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories.

Bayesian Computation with R
Author:Jim Albert
Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in Robust Statistics, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Author: J. Marin and C.P.Robert
Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of cosmic, the book used cosmological. I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the Metropolis-within-Gibbs algorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, detailed balance equation and random walk.

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the semiparametric perspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributions f_j does make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical Methods Robert, C. and Casella, G. (2004)
Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be Bayesian Computation with Python, IDL, Matlab, Java, or C/C++ for those who never intend to use R. By the way, for Mathematica users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

[ArXiv] Statistical Analysis of fMRI Data

hlee — Wed, 02 Sep 2009 00:43:13 +0000

[arxiv:0906.3662] The Statistical Analysis of fMRI Data by Martin A. Lindquist
Statistical Science, Vol. 23(4), pp. 439-464

This review paper offers some information and guidance of statistical image analysis for fMRI data that can be expanded to astronomical image data. I think that fMRI data contain similar challenges of astronomical images. As Lindquist said, collaboration helps to find shortcuts. I hope that introducing this paper helps further networking and collaboration between statisticians and astronomers.

List of similarities

data acquisition: data read in frequency domain and image reconstruction via inverse Fourier transform. (To my naive eyes, It looks similar to Power Spectrum Analysis for cosmic microwave background (CMB) data).
amplitudes or coefficients are cared and analyzed, not phase nor wavelets.
understanding data:brain physiology or physics like cosmological models that describe data generating mechanism.
limits in/trade-off between spatial and temporal resolution.
understanding/modeling noise and signal.

These similarities seem common for statistically analyzing images from fMRI or telescopes. Notwithstanding, no astronomers can (or want) to carry out experimental design. This can be a huge difference between medical and astronomical image analysis. My emphasis is that because of these commonalities, strategies in preprocessing and data analysis for fMRI data can be shared for astronomical observations and vise versa. Some sloggers would like to check Section 6 that covers various statistical models and methods for spatial and temporal data.

I’d rather simply end this posting with the following quotes, saying that statisticians play a critical role in scientific image analysis.

There are several common objectives in the analysis of fMRI data. These include localizing regions of the brain activated by a task, determining distributed networks that correspond to brain function and making predictions about psychological or disease states. Each of these objectives can be approached through the application of suitable statistical methods, and statisticians play an important role in the interdisciplinary teams that have been assembled to tackle these problems. This role can range from determining the appropriate statistical method to apply to a data set, to the development of unique statistical methods geared specifically toward the analysis of fMRI data. With the advent of more sophisticated experimental designs and imaging techniques, the role of statisticians promises to only increase in the future.

A full spatiotemporal model of the data is generally not considered feasible and a number of short cuts are taken throughout the course of the analysis. Statisticians play an important role in determining which short cuts are appropriate in the various stages of the analysis, and determining their effects on the validity and power of the statistical analysis.

Wavelet-regularized image deconvolution

hlee — Fri, 12 Jun 2009 20:47:36 +0000

A Fast Thresholded Landweber Algorithm for Wavelet-Regularized Multidimensional Deconvolution
Vonesch and Unser (2008)
IEEE Trans. Image Proc. vol. 17(4), pp. 539-549

Quoting the authors, I also like to say that the recovery of the original image from the observed is an ill-posed problem. They traced the efforts of wavelet regularization in deconvolution back to a few relatively recent publications by astronomers. Therefore, I guess the topic and algorithm of this paper could drag some attentions from astronomers.

They explain the wavelet based reconstruction procedure in a simple term. The matrix-vector product w_x= Wx yields the coefficients of x in the wavelet basis, and W^TWx reconstructs the signal from these coefficients.

Their assumed model is

y=Hx_orig + b,

where y and x_{orig} are vectors containing uniform samples of the original and measured signals; b represents the measurement error. H is a square (block) circulant matrix that approximates the convolution with the PSF. Then, the problem of deconvolution is to find an estimate that maximizes the cost function

J(x) = J_data(x)+ λ J_reg(x)

They described that “this functional can also interpreted as a (negative) log-likelihood in a Bayesian statistical framework, and deconvolution can then be seen as a maximum a posteriori (MAP) estimation problem.” Also the description of the cost function is applicable to the frequently appearing topic in regression or classification problems such as ridge regression, quantile regression, LASSO, LAR, model/variable selection, state space models from time series and spatial statistics, etc.

The observed image is the d-dimensional covolution of an origianl image (the characteristic function of the object of interest) with the impulse response (or PSF). of the imaging system.

The notion of regularization or penalizing the likelihood seems not well received among astronomers based on my observation that often times the chi-square minimization (the simple least square method) without penalty is suggested and used in astronomical data analysis. Since image analysis with wavelets popular in astronomy, the fast algorithm for wavelet regularized variational deconvolution introduced in this paper could bring faster results to astronomers and could offer better insights of the underlying physical processes by separating noise and background more in a model according fashion, not simple background subtraction.

[MADS] HMM

hlee — Mon, 08 Dec 2008 03:23:11 +0000

MADS stands for “Missing in ADS.” Every astronomer, I believe, knows what ADS is. As we have [EotW] series and used to have [ArXiv] series, creating a new series for semi-periodic postings under the well known name ADS seems interesting.

I’m not sure about these days, but when I was studying astronomy a decade ago, ADS was Google in astronomy. Once switching to statistics, I was shocked at the fact that there was no composite search engine for statistical literature and databases. I showed ADS to fellow statistics students how good this is at that time and compared ADS with what are available in statistics: JSTOR only had 5 year and older materials. Citeseer was not born nor Project Euclid. Google scholar was not thinkable at all. I used to dig the library cd-roms to satisfy my craving for more information. Now those days are over thanks to Google and other scientific search engines. Yet, astronomers prefer ADS than any other database and search engines because of its comprehensiveness.

Let’s stop praising ADS here and focus on [MADS]. The key of [MADS] is to introduce something common and popular in other fields that does not appear in ADS. Believe it or not, sometimes I encounter missing elements, most likely jargon of other fields, from this giant and old (mature) data system. For example, HMM is one although more will come in the series. HMM stands for Hidden Markov Model. When you put “Hidden Markov Model” as keywords in your search among referred astronomical journals^[1], you’ll see no result within astronomical publications.

Then, what is Hidden Markov Model? I’d rather defer my answer to wiki:Hidden Markov Model, references therein, and image/signal processing text books (I learned the term from a undergraduate text book about a decade ago. So HMM must be a very common and well received methodology). Since astronomers handle images and signals so often, I thought HMM might be a useful tool for modeling and analyzing astronomical data some years back. Unfortunately, it hasn’t emerged yet.

Finding a MADS does not provide me an eureka moment. It only makes me wish that this MADS appears soon in ADS. One of you soon will be the first person who adopts HMM in your research and will be cited as a pioneer within the astronomy community.

Well, against all this hope, I might be forced to drop this post if someone finds out HMM is already described in published astronomy papers while he/she teaches me how to search ADS better in secret.

Otherwise, ADS search all arxiv papers, which include all computer science, math, statistics, physics, and more

compressed sensing and a blog

hlee — Thu, 25 Oct 2007 01:15:52 +0000

My friend’s blog led me to Terrence Tao’s blog. A mathematician writes topics of applied mathematics and others. A glance tells me that all postings are well written. Especially, compressed sensing and single pixel cameras drags my attention more because the topic stimulates thoughts of astronomers in virtual observatory^[1] and image processing^[2] (it is not an exaggeration that observational astronomy starts with taking pictures in a broad sense) and statisticians in multidimensional applications, not to mention engineers in signal and image processing.

A particular interest of mine from his post is that compressed sensing could resolves bandwidth problems in astronomy and consequential sequential analysis on astronomical data (streaming data analysis). Overall, his list of applications at the end may enlighten scientists probing the sky with different waveband telescopes.

see the slog posting “Virtual Observatory”
see the slog posting “The power of wavedetect”