Oct 22nd, 2009| 07:08 pm | Posted by hlee

[arXiv:stat.ME:0910.2585]

Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications

by *Murphy, Dean, and Raftery*

Classifying or clustering (or semi supervised learning) spectra is a very challenging problem from collecting statistical-analysis-ready data to reducing the dimensionality without sacrificing complex information in each spectrum. Not only how to estimate spiky (not differentiable) curves via statistically well defined procedures of estimating equations but also how to transform data that match the regularity conditions in statistics is challenging.

Continue reading ‘[ArXiv] classifying spectra’ »

Tags:

BIC,

Classification,

clustering,

cross-validation,

curse of dimensionality,

discriminant analysis,

graphical model,

mclust,

model based,

semi-supervised learning,

statistical learning,

variable selection Category:

Algorithms,

arXiv,

Cross-Cultural,

Data Processing,

Jargon,

Methods,

Spectral,

Stat |

Comment
Oct 6th, 2009| 08:30 pm | Posted by hlee

Tags:

Classification,

clustering,

factor analysis,

Hubble,

multivariate analysis,

principle component analysis,

SING,

Spitzer,

tuning fork Category:

Algorithms,

Astro,

Cross-Cultural,

Data Processing,

Galaxies,

Jargon,

Methods,

Objects,

Stars,

Stat |

Comment
Jul 29th, 2009| 01:02 am | Posted by hlee

Speaking of XAtlas from my previous post I tried another visualization tool called **Parallel Coordinates** on these Capella observations and two stars with multiple observations (AL Lac and IM Peg). As discussed in [MADS] Chernoff face, full description of the catalog is found from XAtlas website. The reason for choosing these stars is that among low mass stars, next to Capella (I showed 16), IM PEG (HD 21648, 8 times), and AR Lac (although different phases, 6 times) are most frequently observed. I was curious about which variation, within (statistical variation) and between (Capella, IM Peg, AL Lac), is dominant. How would they look like from the parametric space of High Resolution Grating Spectroscopy from Chandra? Continue reading ‘[MADS] Parallel Coordinates’ »

Tags:

Classification,

clustering,

display,

EDA,

eye catcher,

GGobi,

Inselberg,

parallel coordinates,

visualization Category:

Algorithms,

arXiv,

Cross-Cultural,

Data Processing,

High-Energy,

Jargon,

Methods,

Spectral,

X-ray |

3 Comments
Sep 18th, 2008| 07:48 pm | Posted by hlee

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse **classification and clustering** and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a **black box**. I mean a black box as in neural network, which is one of classification algorithms. Continue reading ‘Classification and Clustering’ »

Tags:

black box,

book,

catalog,

Classification,

clustering,

haste,

outliers,

R,

Robert Serfling,

semi-supervised learning,

survey Category:

Algorithms,

arXiv,

Astro,

Bad AstroStat,

Cross-Cultural,

Data Processing,

Frequentist,

Jargon,

Methods,

Stat |

Comment
Jun 19th, 2008| 11:42 pm | Posted by hlee

I was questioned by two attendees, acquainted before the AAS, if I can suggest them clustering methods relevant to their projects. After all, we spent quite a time to clarify the term **clustering.** Continue reading ‘my first AAS. IV. clustering’ »

May 26th, 2008| 02:59 pm | Posted by hlee

Tags:

clustering,

high dimension,

LF,

maximum likelihood,

multivariate,

Poisson,

Schechter,

zero count Category:

arXiv,

Bayesian,

Fitting,

MCMC,

Methods,

Stat |

Comment
Feb 13th, 2008| 03:41 pm | Posted by hlee

Last week, I was at Tufts colloquium and happened to have a conversation with a computer scientist about **density based clustering**. I understood density as **probabilistic density** and was recollecting a paper by Fraley and Raftery (Model-Based Clustering, Discriminant Analysis, and Density Estimation, JASA, 2002, 97, p.458) and other similar papers I saw in engineering journals like IEEE transactions. For a few moments, I felt uncomfortable and she explained that density meant “how dense observations are.” Density based clustering was meant to be distance based clustering, like k-means, minimum spanning tree, most likely nonparametric approaches. Continue reading ‘language barrier’ »

Jul 25th, 2007| 03:22 am | Posted by hlee

From arxiv/astro-ph:0705.4020v2

**Statistical Evidence for Three classes of Gamma-ray Bursts** by T. Chattopadhyay et. al.

In general, gamma-ray bursts (GRBs) are classified into two groups: long (>2 sec) and short (<2 sec) duration bursts. Nonetheless, there have been some studies including arxiv/astro-ph:0705.4020v2 that statistically proved the optimal existence of 3 clusters. The pioneer work of GRB clusterings was based on hierarchical clustering methods by Mukerjee et. al.(Three Types of Gamma-Ray Bursts)

Continue reading ‘[ArXiv] Three Classes of GRBs, July 21, 2007’ »