The AstroStat Slog » XAtlas

[MADS] Chernoff face

hlee — Thu, 02 Apr 2009 16:00:41 +0000

I cannot remember when I first met Chernoff face but it hooked me up instantly. I always hoped for confronting multivariate data from astronomy applicable to this charming EDA method. Then, somewhat such eager faded, without realizing what’s happening. Tragically, this was mainly due to my absent mind.

After meeting Prof. Herman Chernoff unexpectedly – I didn’t know he is Professor Emeritus at Harvard – the urge revived but I didn’t have data, still then. Alas, another absent mindedness: I don’t understand why I didn’t realize that I already have the data, XAtlas for trying Chernoff faces until today. Data and its full description is found from the XAtlas website (click). For Chernoff face, references suggested in Wiki:Chernoff face are good. I believe some folks are already familiar with Chernoff faces from a New York Times article last year, listed in Wiki (or a subset characterized by baseball lovers?).

Capella is a X-ray bright star observed multiple times for Chandra calibration. I listed 16 ObsIDs in the figures below at each face, among 18+ Capella observations (Last time when I checked Chandra Data Archive, 18 Capella observations were available). These 16 are high resolution observations from which various metrics like interesting line ratios and line to continuum ratios can be extracted. I was told that optically it’s hard to find any evidence that Capella experienced catastrophic changes during the Chandra mission (about 10 years in orbit) but the story in X-ray can’t be very different. In a dismally short time period (10 years for a star is a flash or less), Capella could have revealed short time scale high energy activities via Chandra. I just wanted to illustrate that Chernoff faces could help visualizing such changes or any peculiarities through interpretation friendly facial expressions (Studies have confirmed babies’ ability in facial expression recognitions). So, what do you think? Do faces look similar/different to you? Can you offer me astronomical reasons for why a certain face (ObsID) is different from the rest?

p.s. In order to draw these Chernoff faces, check descriptions of these R functions, faces() (yields the left figure) or faces2() (yields the right figure) by clicking on the function of your interest. There are other variations and other data analysis systems offer different fashioned tools for drawing Chernoff faces to explore multivariate data. Welcome any requests for plots in pdf. These jpeg files look too coarse on my screen.

p.p.s. Variables used for these faces are line ratios and line to continuum ratios, and the order of these input variables could change countenance but impressions from faces will not change (a face with distinctive shapes will look different than other faces even after the order of metrics/variables is scrambled or using different Chernoff face illustration tools). Mapping, say from an astronomical metric to the length of lips was not studied in this post.

p.p.p.s. Some data points are statistical outliers, not sure about how to explain strange numbers (unrealistic values for line ratios). I hope astronomers can help me to interpret those peculiar numbers in line/continuum ratios. My role is to show that statistics can motivate astronomers for new discoveries and to offer different graphics tools for enhancing visualization. I hope these faces motivate some astronomers to look into Capella in XAtlas (and beyond) in details with different spectacles, and find out the reasons for different facial expressions in Capella X-ray observations. Particularly, ObsID 1199 is most questionable to me.

read.table()

hlee — Mon, 27 Oct 2008 15:05:27 +0000

The first step of data analysis or applications is reading the data sets into a tool of choice. Recent years, I’ve been using R (see also Learning R) for that regard but I’ve enjoyed freedoms for the same purpose from these languages and tools: BASIC, fortran77/90/95, C/C++, IDL, IRAF, AIPS, mongo/supermongo, MATLAB, Maple, Mathematica, SAS, SPSS, Gauss, ARC, Minitab, and recently Python and ciao which I just began to learn. Many of them I lost the fluency of how to use it. Quick learning tends to be flash memory. Some will need brain defragmentation and recovering time for extensive scientific work. A few I don’t like to use at all. No matter what, I’m not a computer geek. I’m not good at new gadgets, new softwares, nor welcome new and allegedly versatile computing systems. But one must be if he/she want to handle data. Until recently I believed R has such versatility in the aspect of reading in data. Yet, there is nothing without exceptions.

From time to time, I talked about among many factors, FITS format data make it difficult statisticians and astronomers work together. Statisticians cannot read in FITS format unless astronomers convert it into ascii or jpeg format for them whereas astronomers do not want to wasted their busy time for doing a chore like file format conversion wasting computer resources as well. Only a peaceful reunion happens when the data analysis become intractable via traditional methodology described in Numerical Recipes or Bevington and Robinson. They realize statistical (new) theory need to be found and collaboration happens with involvement of graduate students from both fields who patiently do many tedious jobs while learning (I missed this part while I was graduate student, which sometimes I thank my advisor for).

Now, let’s get back to the title. read.table()^[1] is a commonly used command line in R when you read in data in ascii format. It’ reads in data intelligently. As I said, it has been versatile enough. Numerals are in numeric format, letters are character format, missings are stored as NA, etc. read.table() make it easy to jump into data analysis right away. Well, now you know why I write this. I confronted a case read.table() does not read things correctly with astronomical data “even in ascii format.,” which I never had since I began to use S-Plus/R.

Although I know how to fix this simple problem that I’ll describe later, I want to point out the lack of compatibility in data formats between two communities and the common tools used for accessing data sets, which, I believe, is one of the biggest factors that prohibit astronomically uneducated statisticians from participating collaborations. I’ve mixed up tools for consulting courses to assist clients of various disciplines (grad students from agriculture, horticulture, physiology, social science, psychology were my clients) and for executing projects in electrical engineering and computational physics (these heavily rely on MATLAB) but reading data was the most simplest and fundamental step that I don’t have to worry about across various data sets with R (probably, those graduate students and professors of engineering and physics provided well trimmed and proven data sets).

When you have a long way to complete your mission and when you stumbled with your first step, I think it’s easy to loose eagerness for the future unless there’s support from your colleagues. Instead, I mostly likely receive discouraging comments such as “Why using R?” “You won’t have such problems if you use other tools” (Although it takes a bit of extra time to manuever, I eventually get to there). Such frustrating comments also degrade eagerness furthermore. So, from 100% I normally begin with, only 25% eagerness is left after two discouraging moments occurred at the initial step of data analysis whose end is invisibly far away. I only hang on to this 25%, still big by the normal standard and I wish for this last long until the final step without exponential decays that happened at the beginning.

Ah, the example, I promised. Click here for one example (from XAtlas) and check if read.table() can do the job in an one shot when the 3rd column is your x and the 4th column is your y. It’ll produce a beautiful spectrum if the data points are read in properly as numerals. My trick was using awk to extract those two columns because of unequal row entries in columns and read that into R. Such two steps work unfortunately made read.table() of R recognized entries as categorical data. To remove the episode of R recognizing entries as categorical data, between two steps, you must to fix the cause that read.table() reads what looks like numerals into categorical. If you investigate the data set files carefully you’ll find why; however, it’s a bit of tedious job when one have thousand entries in each data file and there are numerous data files. Without information, this effort will be same as writing a line of scanf()/READ in C/Fortran by counting column by column to type correct floating point format. This manifest the differences of formatting tables between astronomers and statisticians including scientists from ecometrics, econometrics, psycometrics, biometrics, bioinformatics, and others that include statistics related suffix.

Except such artifact (or cultural difference), XAtlas is a great catalog for statisticians in functional data analysis, who look for examples to deal with non smooth curves. New strategies and statistical applications will help astronomers see such unprecedented data sets better. Perhaps, actually more certainty, your 25% will grow back to 100% once you see those spectra and other metrics on your own plotting windows.

click here for the explanation of the read.table() function and
click here for the reason why is read.table() so inefficient?

Summarizing Coronal Spectra

vlk — Wed, 11 Jul 2007 15:50:50 +0000

Hyunsook and I have preliminary findings (work done with the help of the X-Atlas group) on the efficacy of using spectral proxies to classify low-mass coronal sources, put up as a poster at the XGratings workshop. The workshop has a “poster haiku” session, where one may summarize a poster in a single transparency and speak on it for a couple of minutes. I cannot count syllables, so I wrote a limerick instead:

For simple models, hardness ratios make for a useful grid;
But to describe hi-res coronal spectra they’re quite horrid.: So we went to find, with line ratios as witness,; Patterns and trends in a high-dimensional mess;
And extract stellar subclasses from the morass where it is hid.

Update: The poster is at CHASC.