Quote of the Week, Aug 23, 2007

These are from two lively CHASC discussions on classification, or cluster analysis. The first was on Feb 7, 2006; the continuation on Dec 12, 2006, at the Harvard Statistics Department, as part of Stat 310 .

David van Dyk:

Don’t demand too much of the classes. You’re not going to say that all events can be well-classified…. It’s more descriptive. It gives you places to look. Then you look at your classes.

Xiao Li Meng:

Then you’re saying the cluster analysis is more like -

David van Dyk:

It’s really like you have a propsal for classes. You then investigate the physical processes more thoroughly. You may have classes that divide it [up]


David van Dyk:

But it can make a difference, where you see the clusters, depending on your [parameter] transformation.You can squish the white spaces, and stretch out the crowded spaces; so it can change where you think the clusters are.

Aneta Siemignowska:

But that is interesting.

Andreas Zezas:

Yes, that is very interesting.

These are particularly in honor of Hyunsook Lee‘s recent posting of Chattopadhyay et. al.’s new work about possible intrinsic classes of gamma-ray bursts. Are they really physical classes — or do they only appear to be distinct clusters because we view them through the “squished” lens (parameter spaces) of our imperfect instruments?

  1. hlee:

    Hilarious and witty but it tells a lot on clustering. When it comes to clustering, eyes are the best but unfortunately, eyes cannot do much with higher dimensional data. Statistics and machine learning are tools to assist eyes but, as David van Dyk pointed out, choices of dimension reduction or parameter transformation methods make results controversial. What should we follow? scientific expertise or statistical optimality?

    08-24-2007, 12:16 pm
  2. hlee:

    An excerpt from Cluster Analysis by Everitt, Landau, and Leese.

    It is generally impossible a priori to anticipate what combination of variables, similarity measures and clustering techniques is likely to lead to interesting and informative classification. Consequently, the analysis proceeds through several stages, with the researcher intervening if necessary to alter variables, choose a different similarity measure, concentrate on a particular subset of individuals, and so on. The final, extremely important, stage concerns the evaluation of the clustering solutions obtained. Are the clusters real or merely artifacts of the algorithms? Do other solutions exist which are better?

    08-24-2007, 1:16 pm
  3. aconnors:

    Very interesting quotes, Hyunsook!

    Speaking of questioning whether classifications and correlations are instrumental versus intrinic to the physics, I note this has recently appeared on astro-ph:

    I haven’t worked through their arguments myself, so I can’t speak for it; but the ideas are certainly worth discussing in some depth.

    08-24-2007, 4:47 pm
Leave a comment