The AstroStat Slog » book

[Book] The Elements of Statistical Learning, 2nd Ed.

hlee — Thu, 22 Jul 2010 13:25:44 +0000

This was written more than a year ago, and I forgot to post it.

I’ve noticed that there are rapidly growing interests and attentions in data mining and machine learning among astronomers but the level of execution is yet rudimentary or partial because there has been no comprehensive tutorial style literature or book for them. I recently introduced a machine learning book written by an engineer. Although it’s a very good book, it didn’t convey the foundation of machine learning built by statisticians. In the quest of searching another good book so as to satisfy the astronomers’ pursuit of (machine) learning methodology with the proper amount of statistical theories, the first great book came along is The Elements of Statistical Learning. It was chosen for this writing not only because of its fame and its famous authors (Hastie, Tibshirani, and Friedman) but because of my personal story. In addition, the 2nd edition, which contains most up-to-date and state-of-the-art information, was released recently.

First, the book website:

The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman

You’ll find examples, R codes, relevant publications, and plots used in the text books.

Second, I want to tell how I learned about this book before its first edition was published. Everyone has a small moment of meeting very famous people. Mine is shaking hands with President Clinton, in 2000. I still remember the moment vividly because I really wanted to tell him that ice cream was dripping on his nice suit but the top of the line guards blocked my attempt of speaking/pointing icecream dripping with a finger afterward the hand shaking. No matter what context is, shaking hands with one of the greatest presidents is a memorable thing. Yet it was not my cherishing moment because of icecreaming dripping and scary bodyguards. My most cherishing moment of meeting famous people is the half an hour conversation with late Prof. Leo Breinman (click for my two postings about him), author of probability textbook, creator of CART, and the most forefront pioneer in machine learning.

The conclusion of that conversation was a book soon to be published after explaining him my ideas of applying statistics to astronomical data and his advices to each problems. I was not capable to understand every statistics so that his answer about this new coming book at that time was the most relevant and apt one.

This conversation happened during the 3rd Statistical Challenges in Modern Astronomy (SCMA). Not long passed since I began my graduate study in statistics but had an opportunity to assist the conference organizer, my advisor Dr. Babu and to do some chores during the conference. By accident, I read the book by Murtagh about multivariate data analysis, so I wanted to speak to him. Except that, I have no desire to speak renown speakers and attendees. Frankly, I didn’t have any idea who’s who at the conference and a few years later, I realized that the conference dragged many famous people and the density of such people was higher than any conference I attended. Who would have imagine that I could have a personal conversation with Prof. Breiman, at that time. I have seen enough that many famous professors train people during conferences. Getting a chance for chatting some seconds are really hard and tall/strong people push someone small like me away always.

The story goes like this: a sunny perfect early summer afternoon, he was taking a break for a cigar and I finished my errands for the session. Not much to do until the end of session, I decided to take some fresh air and I spotted him enjoying his cigar. Only the worst was that I didn’t know he was the person of CART and the founder of statistical machine learning. Only from his talk from the previous session, I learned he was a statistician, who did data mining on galaxies. So, I asked him if I can join him and ask some questions related to some ideas that I have. One topic I wanted to talk about classification of SN light curves, by that time from astronomical text books, there are Type I & II, and Type I has subcategories, Ia, Ib, and Ic. Later, I heard that there is Type III. But the challenge is observations didn’t happen with equal intervals. There were more data mining topics and the conversation went a while. In the end, he recommended me a book which will be published soon.

Having such a story, a privilege of talking to late Prof. Breiman through an very unique meeting, SCMA, before knowing the fame of the book, this book became one of my favorites. The book, indeed, become popular, around that time, almost only book discussing statistical learning; therefore, it was an excellent textbook for introducing statistics to engineerers and machine learning to statisticians. In the mean time, statistical learning enjoyed popularity in many disciplines that have data sets and urging for learning with the aid of machine. Now books and journals on machine learning, data mining, and knowledge discovery (KDD) became prosperous. I was so delighted to see the 2nd edition in the market to bridge the gap over the years.

I thank him for sharing his cigar time, probably his short free but precious time for contemplation, with me. I thank his patience of spending time with such an ignorant girl with a foreign english accent. And I thank him for introducing a book which will became a bible in the statistical learning community within a couple of years (I felt proud of myself that I access the book before people know about it). Perhaps, astronomers cannot have many joys from this book that I experienced from how I encounter the book, who introduced the book, whether the book was used in a course, how often book is referred, etc. But I assure that it’ll narrow the gap in the notions how astronomers think about data mining (preprocessing, pipelining, and bulding catalogs) and how statisticians treat data mining. The newly released 2nd edition would help narrowing the gap further and assist astronomers to coin brilliant learning algorithms specific for astronomical data. [The END]

—————————– Here, I patch my scribbles about the book.

What distinguish this book from other machine learning books is that not only authors are big figures in statistics but also fundamentals of statistics and probability are discussed in all chapters. Most of machine learning books only introduce elementary statistics and probability in chapter 2, and no basics in statistics is discussed in later chapters. Generally, empirical procedures, computer algorithms, and their results without presenting basic theories in statistics are presented.

You might want to check the book’s website for data sets if you want to try some ideas described there
The Elements of Statistical Learning
In addition to its historical footprint in the field of statistical learning, I’m sure that some astronomers want to check out topics in the book. It’ll help to replace some data analysis methods in astronomy celebrating their centennials sooner or later with state of the art methods to cope with modern data.

This new edition reflects some evolutions in statistical learning whereas the first edition has been an excellent harbinger of the field. Pages quoted from the 2nd edition.

[p.28] Suppose in fact that our data arose from a statistical model $Y=f(X)+e$ where the random error e has E(e)=0 and is independent of X. Note that for this model, f(x)=E(Y|X=x) and in fact the conditional distribution Pr(Y|X) depends on X only through the conditional mean f(x).
The additive error model is a useful approximation to the truth. For most systems the input-output pairs (X,Y) will not have deterministic relationship Y=f(X). Generally there will be other unmeasured variables that also contribute to Y, including measurement error. The additive model assumes that we can capture all these departures from a deterministic relationship via the error e.

How statisticians envision “model” and “measurement errors” quite different from astronomers’ “model” and “measurement errors” although in terms of “additive error model” they are matching due to the properties of Gaussian/normal distribution. Still, the dilemma of hen or eggs exists prior to any statistical analysis.

[p.30] Although somewhat less glamorous than the learning paradigm, treating supervised learning as a problem in function approximation encourages the geometrical concepts of Euclidean spaces and mathematical concepts of probabilistic inference to be applied to the problem. This is the approach taken in this book.

Strongly recommend to read chapter 3, Linear Methods for Regression: In astronomy, there are so many important coefficients from regression models, from Hubble constant to absorption correction (temperature and magnitude conversion is another example. It seems that these relations can be only explained via OLS (ordinary least square) with the homogeneous error assumption. Yet, books on regressions and linear models are not generally thin. As much diversity exists in datasets, more amount of methodology, theory and assumption exists in order to reflect that diversity. One might like to study the statistical properties of these indicators based on mixture and hierarchical modeling. Some inference, say population proportion can be drawn to verify some hypotheses in cosmology in an indirect way. Understanding what regression analysis and assumptions and how statistician efforts made these methods more robust and interpretable, and reflecting reality would change forcing E(Y|X)=aX+b models onto data showing correlations (not causality).

Quotes from Common Errors in Statistics

hlee — Fri, 13 Nov 2009 17:13:01 +0000

by P.I.Good and J.W.Hardin. Publisher’s website

My astronomer neighbor mentioned this book a while ago and quite later I found intriguing quotes.

GIGO: Garbage in; garbage out. Fancy statistical methods will not rescue garbage data. Course notes of Raymond J. Carroll (2001)

I often see a statement like data were grouped/binned to improve statistics. This seems hardly true unless the astronomer knows the true underlying population distribution from which those realizations (either binned or unbinned) are randomly drawn. Nonetheless, smoothing/binning (modifying sample) can help hypothesis testing to infer the population distribution. This validation step is often ignored, though. For the righteous procedures of statistics application, I hope astronomers adopt the concepts in the design of experiments to collect good quality data without wasting resources. What I mean by wasting resources is that, due to the instrumental and atmospheric limitations, indefinite exposure is not necessary to collect good quality image. Instead of human eye inspection, machine can do the job. I guess that minimax type optimal points exist for operating telescopes, feature extraction/detection, or model/data quality assessment. Clarifying the sources of uncertainty and stratifying them for testing, sampling, and modeling purposes as done in analysis of variance is quite unexplored in astronomy. Instead, more efforts go to salvaging garbage and so far, many gems are harvested by tremendous amount of efforts. But, I’m afraid that it could get as much painful as gold miners’ experience during the mid 19th century gold rush.

Interval Estimates (p.51)
A common error is to specify a confidence interval in the form (estimate – k*standard error, estimate+k*standard error). This form is applicable only when an interval estimate is desired for the mean of a normally distributed random variable. Even then k should be determined for tables of the Student’s t-distribution and not from tables of the normal distribution.

How to get appropriate degrees of freedom seems most relevant to avoid this error when estimates are the coefficients of complex curves or equation/model itself. The t-distribution with a large d.f. (>30) is hardly discernible from the z-distribution.

Desirable estimators are impartial,consistent, efficient, robust, and minimum loss. Interval estimates are to be preferred to point estimates; they are less open to challenge for they convey information about the estimate’s precision.

Every Statistical Procedure Relies on Certain Assumptions for correctness.

What I often fail to find from astronomy literature are these assumptions. Statistics is not elixir to every problem but works only on certain conditions.

Know your objectives in testing. Know your data’s origins. Know the assumptions you feel comfortable with. Never assign probabilities to the true state of nature, but only to the validity of your own predictions. Collecting more and better data may be your best alternative

Unfortunately, the last sentence is not an option for astronomers.

From Guidelines for a Meta-Analysis
Kepler was able to formulate his laws only because (1) Tycho Brahe has made over 30 years of precise (for the time) astronomical observations and (2) Kepler married Brahe’s daughter and thus gained access to his data.

Not exactly same but it reflects some reality of contemporary. Without gaining access to data, there’s not much one can do and collecting data is very painstaking and time consuming.

From Estimating Coefficient
…Finally, if the error terms come from a distribution that is far from Gaussian, a distribution that is truncated, flattened or asymmetric, the p-values and precision estimates produced by the software may be far from correct.

Please, double check numbers from your software.

To quote Green and Silverman (1994, p. 50), “There are two aims in curve estimation, which to some extent conflict with one another, to maximize goodness-of-fit and to minimize roughness.

Statistically significant findings should serve as a motivation for further corroborative and collateral research rather than as a basis for conclusions.

To be avoided are a recent spate of proprietary algorithms available solely in software form that guarantee to find a best-fitting solution. In the worlds of John von Neumann, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.” Goodness of fit is no guarantee of predictive success, …

If physics implies wiggles, then there’s nothing wrong with an extra parameter. But it is possible that best fit parameters including these wiggles might not be the ultimate answer to astronomers’ exploration. It can be just a bias due to introducing this additional parameter for wiggles in the model. Various statistical tests are available and caution is needed before reporting best fit parameter values (estimates) and their error bars.

[Books] Bayesian Computations

hlee — Fri, 11 Sep 2009 20:40:23 +0000

A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories.

Bayesian Computation with R
Author:Jim Albert
Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in Robust Statistics, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Author: J. Marin and C.P.Robert
Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of cosmic, the book used cosmological. I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the Metropolis-within-Gibbs algorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, detailed balance equation and random walk.

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the semiparametric perspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributions f_j does make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical Methods Robert, C. and Casella, G. (2004)
Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be Bayesian Computation with Python, IDL, Matlab, Java, or C/C++ for those who never intend to use R. By the way, for Mathematica users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

[MADS] Kriging

hlee — Wed, 26 Aug 2009 02:19:26 +0000

Kriging is the first thing that one learns from a spatial statistics course. If an astronomer sees its definition and application, almost every astronomer will say, “Oh, I know this! It is like the 2pt correlation function!!” At least this was my first impression when I first met kriging.

There are three distinctive subjects in spatial statistics: geostatistics, lattice data analysis, and spatial point pattern analysis. Because of the resemblance between the spatial distribution of observations in coordinates and the notion of spatially random points, spatial statistics in astronomy has leaned more toward the spatial point pattern analysis than the other subjects. In other fields from immunology to forestry to geology whose data are associated spatial coordinates of underlying geometric structures or whose data were sampled from lattices, observations depend on these spatial structures and scientists enjoy various applications from geostatistics and lattice data analysis. Particularly, kriging is the fundamental notion in geostatistics whose application is found many fields.

Hitherto, I expected that the term kriging can be found rather frequently in analyzing cosmic micro-wave background (CMB) data or large extended sources, wide enough to assign some statistical models for understanding the expected geometric structure and its uncertainty (or interpolating observations via BLUP, best linear unbiased prediction). Against my anticipation, only one referred paper from ADS emerged:

Topography of the Galactic disk – Z-structure and large-scale star formation
by Alfaro, E. J., Cabrera-Cano, J., and Delgado (1991)
in ApJ, 378, pp. 106-118

I attribute this shortage of applying kriging in astronomy to missing data and differential exposure time across the sky. Both require underlying modeling to fill the gap or to convolve with observed data to compensate this unequal sky coverage. Traditionally the kriging analysis is only applied to localized geological areas where missing and unequal coverage is no concern. As many survey and probing missions describe the wide sky coverage, we always see some gaps and selection biases in telescope pointing directions. So, once this characteristics of missing is understood and incorporated into models of spatial statistics, I believe statistical methods for spatial data could reveal more information of our Galaxy and universe.

A good news for astronomers is that nowadays more statisticians and geo-scientists working on spatial data, particularly from satellites. These data are not much different compared to traditional astronomical data except the direction to which a satellite aims (inward or outward). Therefore, data of these scientists has typical properties of astronomical data: missing, unequal sky coverage or exposure and sparse but gigantic images. Due to the increment of computational power and the developments in hierarchical modeling, techniques in geostatistics are being developed to handle these massive, but sparse images for statistical inference. Not only denoising images but they also aim to produce a measure of uncertainty associated with complex spatial data.

For those who are interested in what spatial statistics does, there are a few books I’d like to recommend.

Cressie, N (1993) Statistics for spatial data
(the bible of statistical statistics)
Stein, M.L. (2002) Interpolation of Spatial Data: Some Theory for Kriging
(it’s about Kriging and written by one of scholarly pinnacles in spatial statistics)
Banerjee, Carlin, and Gelfand (2004) Hierarchical Modeling and Analysis for Spatial Data
(Bayesian hierarchical modeling is explained. Very pragmatic but could give an impression that it’s somewhat limited for applications in astronomy)
Illian et al (2008) Statistical Analysis and Modelling of Spatial Point Patterns
(Well, I still think spatial point pattern analysis is more dominant in astronomy than geostatistics. So… I feel obliged to throw a book for that. If so, I must mention Peter Diggle’s books too.)
Diggle (2004) Statistical Analysis of Spatial Point Patterns
Diggle and Ribeiro (2007) Model-based Geostatistics

[Book] The Physicists

hlee — Wed, 22 Apr 2009 19:02:44 +0000

I was reading Lehmann’s memoir on his friends and colleagues who influence a great deal on establishing his career. I’m happy to know that his meeting Landau, Courant, and Evans led him to be a statistician; otherwise, we, including astronomers, would have had very different textbooks and statistical thinking would have been different. On the other hand, I was surprised to know that he chose statistics over physics due to his experience from Cambridge (UK). I thought becoming a physicist is more preferred than becoming a statistician during the first half of the 20th century. At least I felt that way, probably it’s because more general science books in physics and physics related historic events were well exposed so that I became to think that physicists are more cooler than other type scientists.

The Physicists by Friedrich Durrenmatt

This short play (wiki link) is very charming and fun to read. Some statisticians would enjoy it. A subset of readers might embrace physics instead of repelling it. At least, it would show different aspects of non statistical science to statisticians beyond genetics, biology, medical science, economics, sociology, agricultural science, psychology, meteorology, and so on, where interdisciplinary collaborations are relatively well established.

The links for The Physicists and the book by Lehmann below were from Amazon.

Reminiscences of a Statistician: The Company I Kept by Erich Lehmann

The following excerpt from Reminiscence…, however, was more interesting how statistics were viewed to young Lehmann because I felt the same prior to learning statistics and watching how statistics were used in astronomical data analysis.

… I did not like it (statistics). It was lacking the element that had attracted me to mathematics as a boy: statistics did not possess the beauty that I have found in the integers and later in other parts of mathematics. Instead, ad hoc methods were used to solve problems that were messy and that where based on questionable assumptions that seemed quite arbitrary.

Aside, I have another post on his article , On the history and use of some standard statistical models.

I’d like to recommend another with a hope that someone finds its english translation (I have been searching but kept failed).

Der Teil und das Ganze by Werner Heisenberg.

YES, Heisenberg of the uncertainty principle! My understanding is that the notion of uncertainty is different among physicists, statisticians, and modern astronomers. I think it has evolved without communications.

Related to uncertainty, I also want to recommend again Professor Lindley’s insightful paper, discussed in another post, Statistics is the study of uncertainty.

Not many statisticians are exposed to (astro)physics and vice versa, probably the primary reason of wasting time on explaining λ (poisson rate parameter vs. wavelength), ν (nuisance parameter vs. frequency), π or φ (pdfs vs. particles), Ω (probability space vs. cosmological constant), H_o (null hypothesis vs. Hubble constant), to name a few. I hope this general reading recommendations is useful to narrow gaps and time wastes.

[Book] Elements of Information Theory

hlee — Wed, 11 Mar 2009 17:04:26 +0000

by T. Cover and J. Thomas website: http://www.elementsofinformationtheory.com/

Once, perhaps more, I mentioned this book in my post with the most celebrated paper by Shannon (see the posting). Some additional recommendation of the book has been made to answer offline inquiries. And this book always has been in my favorite book list that I like to use for teaching. So, I’m not shy with recommending this book to astronomers with modern objective perspectives and practicality. Before advancing for more praises, I must say that those admiring words do not imply that I understand every line and problem of the book. Like many fields, Information theory has grown fast since the monumental debut paper by Shannon (1948) like the speed of astronomers observation techniques. Without the contents of this book, most of which came after Shannon (1948), internet, wireless communication, compression, etc could not have been conceived. Since the notion of “entropy“, the core of information theory, is familiar to astronomers (physicists), the book would be received better among them than statisticians. This book should be read easier to astronomers than statisticians.

My reason for recommending this book is that, personally thinking, having some knowledge in information theory (in data compression and channel capacity) would help to resolve limited bandwidth in the era of massive unprecedented astronomical survey projects with satellites or ground base telescopes.

The content can be viewed from the aspect of applied probability; therefore, the basics of probability theories including distributions and uncertainties become familiar to astronomers than indulging probability textbooks.

Many of my [MADS] series are motivated by the content of this book, where I learned many practical data processing ideas and objectives (data compression, data transmission, network information theory, ergodic theory, hypothesis testing, statistical mechanic, quantum mechanics, inference, probability theory, lossless coding/decoding, convex optimization, etc) although those [MADS] postings are not visible on the slog yet (I hope I can get most of them through within several months; otherwise, someone should continue my [MADS] and introducing modern statistics to astronomers). The most commonly practiced ideas in engineering could help accelerating the data processing procedures in astronomy and turning astronomical inference processes more efficient and consistent, which have been neglected because of many other demands. Here, I’d rather defer discussing details of particular topics from the book and describing how astronomers applied them (There are quite hidden statistical jewels from ADS but not well speculated). Through [MADS], I will discuss further more, how information theory could help processing astronomical data from data collecting, pipe-lining, storing, extracting, and exploring to summarizing, modeling, estimating, inference, and prediction. Instead of discussing topics of the book, I’d like to quote interesting statements in the introductory chapter of the book to offer delicious flavors and to tempt you for reading it.

… it [information theory] has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam’s Razor: The simplest explanation is best), and to probability and statistics (error exponents for optimal hypothesis testing and estimation).

… information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory), and computer science (algorithmic complexity).

There is a pleasing complementary relationship between algorithmic complexity and computational complexity. One can think about computational complexity (time complexity) and Kolmogorov complexity (program length or descriptive complexity) as two axes corresponding to program running time and program length. Kolmogorov complexity focuses on minimizing along the second axis, and computational complexity focuses on minimizing along the first axis. Little work has been done on the simultaneous minimzation of the two.

The concept of entropy in information theory is related to the concept of entropy in statistical mechanics.

In addition to the book’s website, googling the title will show tons of links spanning from gambling/establishing portfolio to computational complexity, in between there are statistics, probability, statistical mechanics, communication theory, data compression, etc where the order does not imply relevance or importance of the subjects. Such broad notion is discussed in the intro chapter. If you have the book in your hand, regardless of their editions, you might first want to check Fig. 1.1 “Relationship of information theory to other fields” a diagram explaining connections and similarities among these subjects.

Data analysis tools, methods, algorithms, and theories including statistics (both exploratory data analysis and inference) should follow the direction of retrieving meaningful information from observations. Sometimes, I feel that priority is lost, ship without captain, treating statistics or information science as black box without any interests of knowing what’s in there.

I don’t know how many astronomy departments offer classes for data analysis, data mining, information theory, machine learning, or statistics for graduate students. I saw none from my alma matter although it offers the famous summer school recently. The closest one I had was computational physics, focusing how to solve differential equations (stochastic differential equations were not included) and optimization (I learned the game theory, unexpected. Overall, I am still fond of what I learned from that class). I haven’t seen any astronomy graduate students in statistics classes nor in EE/CS classes related to signal processing, information theory, and data mining (some departments offer statistics classes for their own students, like the course of experimental designs for students of agriculture science). Not enough educational efforts for the new information era and big survey projects is what I feel in astronomy. Yet, I’m very happy to see some apprenticeships to cope with those new patterns in astronomical science. I only hope it grows, beyond a few small guilds. I wish they have more resources to make their works efficient as time goes.

A book by David Freedman

hlee — Tue, 10 Feb 2009 20:37:41 +0000

A continuation from my posting, titled circumspect frequentist.

Title: Statistical Models: Theory and Practice (click for the publisher’s website)
My one line review, rather a comment several months ago was

Bias in asymptotic standard errors is not a familiar topic for astronomers

and I don’t understand why I wrote it but I think I came up this comment owing to my pursuit of modeling measurement errors occurring in astronomical researches.

My overall impression of the book was that astronomers might not fancy it because of the cited examples and models quite irrelevant to astronomy. On the contrary, I liked it because it reflects what statistics ought to be in the real data analysis world. This does not mean the book covers every bit of statistics. When you teach statistics, you don’t expect student’s learning curve of statistical logistics is continuous. You only hope that they jump the discontinuity points successfully and you give every effort to lower the steps of these discontinuity points. The book looked to offering comforts to ease such efforts or to hint promises for almost continuous learning curves. The perspective and scope of the book was very impressive to me at that time.

It is sad to learn brilliant minded people passing away before their insights reach others who need them. I admire professors at Berkeley, not only because of their research activities and contributions but also because of their pedagogical contributions to statistics and its applications to many fields including astronomy (J. Neyman and E. Scott. are as familiar to statisticians as to astronomers, for example. Their papers about the spatial distribution of galaxies are, to my knowledge, well sought among astronomers).

Circumspect frequentist

hlee — Mon, 02 Feb 2009 02:45:14 +0000

The first issue of this year’s IMS bulletin has an obituary, from which the following is quoted.
Obituary: David A. Freedman (Click here for a direct view of this obituary)

He started his professional life as a probabilist and mathematical statistician with Bayesian leanings but became one of the world’s leading applied statisticians and a circumspect frequentist. In his words:

My own experience suggests that neither decision-makers nor their statisticians do in fact have prior probabilities. A large part of Bayesian statistics is about what you would do if you had a prior. For the rest, statisticians make up priors that are mathematically convenient or attractive. Once used, priors become familiar; therefore, they come to be accepted as ‘natural’ and are liable to be used again; such priors may eventually generate their own technical literature… Similarly, a large part of [frequentist] statistics is about what you would do if you had a model; and all of us spend enormous amounts of energy finding out what would happen if the data kept pouring in.

I have draft posts: one is about his book titled as Statistical Models: Theory and Practice and the other is about his article appeared in arXiv:stat not many months ago and now published in the American Statistician (TAS). In my opinion, both would help astronomers lowering the barrier of theoretical statistics, Bayesian and frequentist methods alike. I blame myself for delaying these posts. Carrying on one’s legacy, I believe, is easier while the person is alive.

[Book] The Grammar of Graphics

hlee — Wed, 08 Oct 2008 23:55:37 +0000

All of a sudden, partially owing to a thought provoking talk about visualization by Felice Frankel at IIC, I recollected a book, The Grammar of Graphics by Leland Wilkinson (2nd Ed. – I partially read the 1st ed. and felt little of use several years ago because there seemed no link for visualization of data from astronomy.)

Both good and bad reviews exist but I don’t believe there’s a book this extensive to cover the grammar of graphics. Not many statisticians are handling images compared to computer vision engineers but at some points, all engineers and scientists must present their work into graphs and tables. By the same token, tongs are different, although alphabets are common. Often times, plots from scientist A cannot talk to scientist B (A \ne B). This communication discrepancy seems prevalent between astronomy and statistics.

Almost all chapters begin with the Greek or Latin origins of chapter names to reflect the common origins of lexicons in graphics regardless of subjects. Some chapters, on the contrary, tend to illuminate different practices/perspectives/interests in graphics between astronomers and statisticians:

Chap. 6 [Scale]: Scaling by log transformation is meant to stabilize errors (Box-Cox transformation) in statistics; in contrast, in astronomy to impose a linear relationship between predictor and response which is manifested better in log scale.
Chap. 7 [Statistics]: Discussion on error bars, bins, and histogram; although graphical tools are same but the objectives seem different (statistics – optimal binning: astronomy – enhancing signals in each bin).
Chap 15. [Uncertainty]: Concepts of uncertainty; many words are associated with uncertainty, for example, variability, noise, incompleteness, indeterminacy, bias, error, accuracy, precision, reliability, validity, quality, and integrity.

Overall, the ideas are implored to be included adaptively in the astronomical data analysis packages for visualizing the analyzed products. Perhaps, it may inspire some astronomers to transform the ways of visualization. For instance, instead of histograms, in my opinion, box-plots, qq-plots, and scatter plots would shed improved information while maintaining the simplicity but except scatter plots, other summary plots are not commonly used in astronomy. A benefit from box plot and qq plot is checking gaussianity without sacrificing information from binning. However, there’s no golden rule which type or grammar of graphics is correct and shall be used . Only exists user preference.

Different disciplines maintain their ways of presenting graphics and expect that they can talk to viewers of other disciplines. No one fully reached that point, disappointingly. Extensive discussion and persuasion is required to deliver stories behind graphics to others.

As Felice Frankel pointed out the way of visualization could enhance recognition and understanding of deliberate delivering of information. To the purpose, a few interesting quotes from the book is replaced the conclusion of this post.

The first ed. of this book, and Part 1 of the current ed., explicitly cautioned that the grammar of graphics is not a visualization system.
We are surprised, nevertheless, to discover how little some visualization researchers in various fields know about the origins of many the of techniques that are routinely applied in visualization.
The grammar of graphics determined how algebra, geometry, aesthetics, statistics, scales, and coordinates interact. In the world of statistical graphics, we cannot confuse aesthetics with geometry by picking a tree graphics to represent a continuous flow of migrating insects across a geographic field simply because we like the impression in conveys.
If we must choose a single word to characterize the focus of modern statistics, it would be uncertainty (Stigler, 1983)
… decision-makers need statistical tools to formalize the scenarios they encounter and they need graphical aids to keep them from making irrational decisions. … the use of graphics for decision-making under uncertainty is a relatively recent field. … We need to go beyond the use of error bars to incorporate other aesthetics in the representation of error. And we need research to assess the effectiveness of decision-making based on these graphics using a Bayesian yardstick.

Classification and Clustering

hlee — Thu, 18 Sep 2008 23:48:43 +0000

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms.

Simply put, classification is regression problem and clustering is mixture problem with unknown components. Defining a classifier, a regression model, is the objective of classification and determining the number of clusters is the objective of clustering. In classification, predefined classes exist such as galaxy types and star types and one wishes to know what prediction variables and their functional allow to separate Quasars from stars without individual spectroscopic observations by only relying on handful variables from photometric data. In clustering analysis, there is no predefined class but some plots visualize multiple populations and one wishes to determine the number of clusters mathematically not to be subjective in concluding remarks saying that the plot shows two clusters after some subjective data cleaning. A good example is that as photons from Gamma ray bursts accumulate, extracting features like F_{90} and F_{50} enables scatter plots of many GRBs, which eventually led people believe there are multiple populations in GRBs. Clustering algorithms back the hypothesis in a more objective manner opposed to the subjective manner of scatter plots with non statistical outlier elimination.

However, there are challenges to make a clear cut between classification and clustering both in statistics and astronomy. In statistics, missing data is the phrase people use to describe this challenge. Fortunately, there is a field called semi-supervised learning to tackle it. (Supervised learning is equivalent to classification and unsupervised learning is to clustering.) Semi-supervised learning algorithms are applicable to data, a portion of which has known class types and the rest are missing — astronomical catalogs with unidentified objects are a good candidate for applying semi-supervised learning algorithms.

From the astronomy side, the fact that classes are not well defined or subjective is the main cause of this confusion in classification and clustering and also the origin of this challenge. For example, will astronomer A and B produce same results in classifying galaxies according to Hubble’s tuning fork?^[1] We are not testing individual cognitive skills. Is there a consensus to make a cut between F9 stars and G0 stars? What make F9.5 star instead of G0? With the presence of error bars, how one is sure that the star is F9, not G0? I don’t see any decision theoretic explanation in survey papers when those stellar spectral classes are presented. Classification is generally for data with categorical responses but astronomer tend to make something used to be categorical to continuous and still remain to apply the same old classification algorithms designed for categorical responses.

From a clustering analysis perspective, this challenge is caused by outliers, or peculiar objects that do not belong to the majority. The size of this peculiar objects can make up a new class that is unprecedented before. Or the number is so small that a strong belief prevails to discard these data points, regarded as observational mistakes. How much we can trim the data with unavoidable and uncontrollable contamination (remember, we cannot control astronomical data as opposed to earthly kinds)? What is the primary cause defining the number of clusters? physics, statistics, astronomers’ experience in processing and cleaning data, …

Once the ambiguity in classification, clustering, and the complexity of data sets is resolved, another challenge is still waiting. Which black box? For the most of classification algorithms, Pattern Recognition and Machine Learning by C. Bishop would offer a broad spectrum of black boxes. Yet, the book does not include various clustering algorithms that statisticians have developed in addition to outlier detection. To become more rigorous in selecting a black box for clustering analysis and outlier detection, one is recommended to check,

Clustering Analysis by Everitt, Landau, and Leese
Data Clustering: Theory, Algorithms, and Applications by Gan, Ma, and Wu
collection of articles and presentation files on nonparametric multivariate analysis by Robert Serfling (Yes, the author of the classical book, Approximation Theorems of Mathematical Statistics), particularly about data depth and outlier detection and
robust statistics by Peter Huber

For me, astronomers tend to be in a haste owing to the pressure of publishing results immediately after data release and to overlook suitable methodologies for their survey data. It seems that there is no time for consulting machine learning specialists to verify the approaches they adopted. My personal prayer is that this haste should not be settled as a trend in astronomical survey and large data analysis.

Check out the project, GALAXY ZOO