The AstroStat Slog » Stat Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 [AAS-HEAD 2011] Time Series in High Energy Astrophysics Fri, 09 Sep 2011 17:05:33 +0000 vlk We organized a Special Session on Time Series in High Energy Astrophysics: Techniques Applicable to Multi-Dimensional Analysis on Sep 7, 2011, at the AAS-HEAD conference at Newport, RI. The talks presented at the session are archived at

A tremendous amount of information is contained within the temporal variations of various measurable quantities, such as the energy distributions of the incident photons, the overall intensity of the source, and the spatial coherence of the variations. While the detection and interpretation of periodic variations is well studied, the same cannot be said for non-periodic behavior in a multi-dimensional domain. Methods to deal with such problems are still primitive, and any attempts at sophisticated analyses are carried out on a case-by-case basis. Some of the issues we seek to focus on are methods to deal with are:
* Stochastic variability
* Chaotic Quasi-periodic variability
* Irregular data gaps/unevenly sampled data
* Multi-dimensional analysis
* Transient classification

Our goal is to present some basic questions that require sophisticated temporal analysis in order for progress to be made. We plan to bring together astronomers and statisticians who are working in many different subfields so that an exchange of ideas can occur to motivate the development of sophisticated and generally applicable algorithms to astronomical time series data. We will review the problems and issues with current methodology from an algorithmic and statistical perspective and then look for improvements or for new methods and techniques.

]]> 0
[announce] upcoming workshops and conferences Wed, 09 Feb 2011 23:03:38 +0000 chasc Kirk Borne has compiled a list of interesting workshops and conferences coming up in the near future:

The Future of Scientific Knowledge Discovery in Open Networked Environments

New York Workshop on Computer, Earth, and Space Sciences 2011

Innovations in Data-Intensive Astronomy

Astrostatistics and Data Mining in Large Astronomical Databases

Statistical Challenges in Modern Astronomy V (including summer school & tutorials)

Very Wide Field Surveys in the Light of Astro2010

Statistical Methods for Very Large Datasets

23rd Scientific and Statistical Database Management Conference

International Statistical Institute (ISI) World Congress

NASA Conference on Intelligent Data Understanding

]]> 0
[announce] SCMA V Sat, 15 Jan 2011 03:35:03 +0000 chasc via David van Dyk, information about 3 events in astrostatistics hosted by Penn State’s Center for Astrostatistics:

  1. Summer School in Statistics for Astronomers VII (June 6-10, 2011)
  2. Pre-conference Tutorials (June 11-12, 2011)
  3. Statistical Challenges in Modern Astronomy V (June 13-17, 2011)*

*Web site:

Registration is now open until May 6
(Summer School registration may close earlier if the enrollment limit is reached)

Contributed papers for the SCMA V conference are welcome

Summer School in Statistics for Astronomers: The seventh summer school is an intensive week covering basic statistical inference, several fields of applied statistics, and hands-on experience with the R computing environment. Topics include: exploratory data analysis, hypothesis testing, parameter estimation, regression, bootstrap resampling, model selection & goodness-of-fit, maximum likelihood and Bayesian methods, nonparametrics, spatial processes, and times series. Instructors are mostly faculty members in statistics.

Pre-conference tutorials: Instruction in four areas of astrostatistical interest presented during the weekend between the Summer School and SCMA V conference. Topics are: Bayesian computation and MCMC; data mining; R for astronomers; and wavelets for image analysis. Instructors are members of the SCMA V Scientific Organizing Committee.

SCMA V conference: Held every five years, SCMA conferences are the premier cross-disciplinary forum for research statisticians and astronomers to discuss methodological issues of mutual interest. Session topics include: statistical modeling in astronomy, Bayesian analysis across astronomy; Bayesian cosmology; data mining and informatics; sparsity; interpreting astrophysical simulations; time domain astronomy; spatial and image analysis; and future directions for astrostatistics. Invited lectures will be followed by cross-disciplinary commentaries. The conference welcomes contributed papers from statisticians and astronomers.

Visit for more information and registration

Eric Feigelson, Dept. of Astronomy & Astrophysics, Penn State,
G. Jogesh Babu, Dept. of Statistics, Penn State,

]]> 0
coin toss with a twist Sun, 26 Dec 2010 22:27:50 +0000 vlk Here’s a cool illustration of how to use Bayesian analysis in the limit of very little data, when inferences are necessarily dominated by the prior. The question, via Tom Moertel, is: suppose I tell you that a coin always comes up heads, and you proceed to toss it and it does come up heads — how much more do you believe me now?

He also has the answer worked out in detail.

(h/t Doug Burke)

]]> 0
Yes, please Tue, 21 Dec 2010 18:36:49 +0000 vlk Andrew Gelman says,

Instead of “confidence interval,” let’s say “uncertainty interval”

]]> 1
CAS 2010 Sat, 14 Aug 2010 18:50:05 +0000 chasc The schedule for the mini-Workshop on Computational AstroStatistics is set:

]]> 0
The Perseid Project [Announcement] Mon, 02 Aug 2010 21:21:35 +0000 vlk There is an ambitious project afoot to build a 3D map of a meteor stream during the Perseids on Aug 11-12. I got this missive about it from the organizer, Chris Crawford:

This will be one of the better years for Perseids; the moon, which often interferes with the Perseids, will not be a problem this year. So I’m putting together something that’s never been done before: a spatial analysis of the Perseid meteor stream. We’ve had plenty of temporal analyses, but nobody has ever been able to get data over a wide area — because observations have always been localized to single observers. But what if we had hundreds or thousands of people all over North America and Europe observing Perseids and somebody collected and collated all their observations? This is crowd-sourcing applied to meteor astronomy. I’ve been working for some time on putting together just such a scheme. I’ve got a cute little Java applet that you can use on your laptop to record the times of fall of meteors you see, the spherical trig for analyzing the geometry (oh my aching head!) and a statistical scheme that I *think* will reveal the spatial patterns we’re most likely to see — IF such patterns exist. I’ve also got some web pages describing the whole shebang. They start here:

I think I’ve gotten all the technical, scientific, and mathematical problems solved, but there remains the big one: publicizing it. It won’t work unless I get hundreds of observers. That’s where you come in. I’m asking two things of you:

1. Any advice, criticism, or commentary on the project as presented in the web pages.
2. Publicizing it. If we can get that ol’ Web Magic going, we could get thousands of observers and end up with something truly remarkable. So, would you be willing to blog about this project on your blog?
3. I would be especially interested in your comments on the statistical technique I propose to use in analyzing the data. It is sketched out on the website here:

Given my primitive understanding of statistical analysis, I expect that your comments will be devastating, but if you’re willing to take the time to write them up, I’m certainly willing to grit my teeth and try hard to understand and implement them.

Thanks for any help you can find time to offer.

Chris Crawford

]]> 0
[Book] The Elements of Statistical Learning, 2nd Ed. Thu, 22 Jul 2010 13:25:44 +0000 hlee This was written more than a year ago, and I forgot to post it.

I’ve noticed that there are rapidly growing interests and attentions in data mining and machine learning among astronomers but the level of execution is yet rudimentary or partial because there has been no comprehensive tutorial style literature or book for them. I recently introduced a machine learning book written by an engineer. Although it’s a very good book, it didn’t convey the foundation of machine learning built by statisticians. In the quest of searching another good book so as to satisfy the astronomers’ pursuit of (machine) learning methodology with the proper amount of statistical theories, the first great book came along is The Elements of Statistical Learning. It was chosen for this writing not only because of its fame and its famous authors (Hastie, Tibshirani, and Friedman) but because of my personal story. In addition, the 2nd edition, which contains most up-to-date and state-of-the-art information, was released recently.

First, the book website:

The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman

You’ll find examples, R codes, relevant publications, and plots used in the text books.

Second, I want to tell how I learned about this book before its first edition was published. Everyone has a small moment of meeting very famous people. Mine is shaking hands with President Clinton, in 2000. I still remember the moment vividly because I really wanted to tell him that ice cream was dripping on his nice suit but the top of the line guards blocked my attempt of speaking/pointing icecream dripping with a finger afterward the hand shaking. No matter what context is, shaking hands with one of the greatest presidents is a memorable thing. Yet it was not my cherishing moment because of icecreaming dripping and scary bodyguards. My most cherishing moment of meeting famous people is the half an hour conversation with late Prof. Leo Breinman (click for my two postings about him), author of probability textbook, creator of CART, and the most forefront pioneer in machine learning.

The conclusion of that conversation was a book soon to be published after explaining him my ideas of applying statistics to astronomical data and his advices to each problems. I was not capable to understand every statistics so that his answer about this new coming book at that time was the most relevant and apt one.

This conversation happened during the 3rd Statistical Challenges in Modern Astronomy (SCMA). Not long passed since I began my graduate study in statistics but had an opportunity to assist the conference organizer, my advisor Dr. Babu and to do some chores during the conference. By accident, I read the book by Murtagh about multivariate data analysis, so I wanted to speak to him. Except that, I have no desire to speak renown speakers and attendees. Frankly, I didn’t have any idea who’s who at the conference and a few years later, I realized that the conference dragged many famous people and the density of such people was higher than any conference I attended. Who would have imagine that I could have a personal conversation with Prof. Breiman, at that time. I have seen enough that many famous professors train people during conferences. Getting a chance for chatting some seconds are really hard and tall/strong people push someone small like me away always.

The story goes like this: a sunny perfect early summer afternoon, he was taking a break for a cigar and I finished my errands for the session. Not much to do until the end of session, I decided to take some fresh air and I spotted him enjoying his cigar. Only the worst was that I didn’t know he was the person of CART and the founder of statistical machine learning. Only from his talk from the previous session, I learned he was a statistician, who did data mining on galaxies. So, I asked him if I can join him and ask some questions related to some ideas that I have. One topic I wanted to talk about classification of SN light curves, by that time from astronomical text books, there are Type I & II, and Type I has subcategories, Ia, Ib, and Ic. Later, I heard that there is Type III. But the challenge is observations didn’t happen with equal intervals. There were more data mining topics and the conversation went a while. In the end, he recommended me a book which will be published soon.

Having such a story, a privilege of talking to late Prof. Breiman through an very unique meeting, SCMA, before knowing the fame of the book, this book became one of my favorites. The book, indeed, become popular, around that time, almost only book discussing statistical learning; therefore, it was an excellent textbook for introducing statistics to engineerers and machine learning to statisticians. In the mean time, statistical learning enjoyed popularity in many disciplines that have data sets and urging for learning with the aid of machine. Now books and journals on machine learning, data mining, and knowledge discovery (KDD) became prosperous. I was so delighted to see the 2nd edition in the market to bridge the gap over the years.

I thank him for sharing his cigar time, probably his short free but precious time for contemplation, with me. I thank his patience of spending time with such an ignorant girl with a foreign english accent. And I thank him for introducing a book which will became a bible in the statistical learning community within a couple of years (I felt proud of myself that I access the book before people know about it). Perhaps, astronomers cannot have many joys from this book that I experienced from how I encounter the book, who introduced the book, whether the book was used in a course, how often book is referred, etc. But I assure that it’ll narrow the gap in the notions how astronomers think about data mining (preprocessing, pipelining, and bulding catalogs) and how statisticians treat data mining. The newly released 2nd edition would help narrowing the gap further and assist astronomers to coin brilliant learning algorithms specific for astronomical data. [The END]

—————————– Here, I patch my scribbles about the book.

What distinguish this book from other machine learning books is that not only authors are big figures in statistics but also fundamentals of statistics and probability are discussed in all chapters. Most of machine learning books only introduce elementary statistics and probability in chapter 2, and no basics in statistics is discussed in later chapters. Generally, empirical procedures, computer algorithms, and their results without presenting basic theories in statistics are presented.

You might want to check the book’s website for data sets if you want to try some ideas described there
The Elements of Statistical Learning
In addition to its historical footprint in the field of statistical learning, I’m sure that some astronomers want to check out topics in the book. It’ll help to replace some data analysis methods in astronomy celebrating their centennials sooner or later with state of the art methods to cope with modern data.

This new edition reflects some evolutions in statistical learning whereas the first edition has been an excellent harbinger of the field. Pages quoted from the 2nd edition.

[p.28] Suppose in fact that our data arose from a statistical model $Y=f(X)+e$ where the random error e has E(e)=0 and is independent of X. Note that for this model, f(x)=E(Y|X=x) and in fact the conditional distribution Pr(Y|X) depends on X only through the conditional mean f(x).
The additive error model is a useful approximation to the truth. For most systems the input-output pairs (X,Y) will not have deterministic relationship Y=f(X). Generally there will be other unmeasured variables that also contribute to Y, including measurement error. The additive model assumes that we can capture all these departures from a deterministic relationship via the error e.

How statisticians envision “model” and “measurement errors” quite different from astronomers’ “model” and “measurement errors” although in terms of “additive error model” they are matching due to the properties of Gaussian/normal distribution. Still, the dilemma of hen or eggs exists prior to any statistical analysis.

[p.30] Although somewhat less glamorous than the learning paradigm, treating supervised learning as a problem in function approximation encourages the geometrical concepts of Euclidean spaces and mathematical concepts of probabilistic inference to be applied to the problem. This is the approach taken in this book.

Strongly recommend to read chapter 3, Linear Methods for Regression: In astronomy, there are so many important coefficients from regression models, from Hubble constant to absorption correction (temperature and magnitude conversion is another example. It seems that these relations can be only explained via OLS (ordinary least square) with the homogeneous error assumption. Yet, books on regressions and linear models are not generally thin. As much diversity exists in datasets, more amount of methodology, theory and assumption exists in order to reflect that diversity. One might like to study the statistical properties of these indicators based on mixture and hierarchical modeling. Some inference, say population proportion can be drawn to verify some hypotheses in cosmology in an indirect way. Understanding what regression analysis and assumptions and how statistician efforts made these methods more robust and interpretable, and reflecting reality would change forcing E(Y|X)=aX+b models onto data showing correlations (not causality).

]]> 0
mini-Workshop on Computational AstroStatistics [announcement] Mon, 21 Jun 2010 16:25:31 +0000 chasc mini-Workshop on Computational Astro-statistics: Challenges and Methods for Massive Astronomical Data
Aug 24-25, 2010
Phillips Auditorium, CfA,
60 Garden St., Cambridge, MA 02138


The California-Boston-Smithsonian Astrostatistics Collaboration plans to host a mini-workshop on Computational Astro-statistics. With the advent of new missions like the Solar Dynamic Observatory (SDO), Panoramic Survey and Rapid Response (Pan-STARRS) and Large Synoptic Survey (LSST), astronomical data collection is fast outpacing our capacity to analyze them. Astrostatistical effort has generally focused on principled analysis of individual observations, on one or a few sources at a time. But the new era of data intensive observational astronomy forces us to consider combining multiple datasets and infer parameters that are common to entire populations. Many astronomers really want to use every data point and even non-detections, but this becomes problematic for many statistical techniques.

The goal of the Workshop is to explore new problems in Astronomical data analysis that arise from data complexity. Our focus is on problems that have generally been considered intractable due to insufficient computational power or inefficient algorithms, but are now becoming tractable. Examples of such problems include: accounting for uncertainties in instrument calibration; classification, regression, and density estimations of massive data sets that may be truncated and contaminated with measurement errors and outliers; and designing statistical emulators to efficiently approximate the output from complex astrophysical computer models and simulations, thus making statistical inference on them tractable. We aim to present some issues to the statisticians and clarify difficulties with the currently used methodologies, e.g. MCMC methods. The Workshop will consist of review talks on current Statistical methods by Statisticians, descriptions of data analysis issues by astronomers, and open discussions between Astronomers and Statisticians. We hope to define a path for development of new algorithms that target specific issues, designed to help with applications to SDO, Pan-STARRS, LSST, and other survey data.

We hope you will be able to attend the workshop and present a brief talk on the scope of the data analysis problem that you confront in your project. The workshop will have presentations in the morning sessions, followed by a discussion session in the afternoons of both days.

]]> 0
Everybody needs crampons Fri, 30 Apr 2010 16:12:36 +0000 vlk Sherpa is a fitting environment in which Chandra data (and really, X-ray data from any observatory) can be analyzed. It has just undergone a major update and now runs on python. Or allows python to run. Something like that. It is a very powerful tool, but I can never remember how to use it, and I have an amazing knack for not finding what I need in the documentation. So here is a little cheat sheet (which I will keep updating as and when if I learn more):

2010-apr-30: Aneta has setup a blogspot site to deal with simple Sherpa techniques and tactics:

On Help:

  • In general, to get help, use: ahelp "something" (note the quotes)
  • Even more useful, type: ? wildcard to get a list of all commands that include the wildcard
  • You can also do a form of autocomplete: type TAB after writing half a command to get a list of all possible completions.

Data I/O:

  • To read in your PHA file, use: load_pha()
  • Often for Chandra spectra, the background is included in that same file. In any case, to read it in separately, use: load_bkg()
    • Q: should it be loaded in to the same dataID as the source?
    • A: Yes.
    • A: When the background counts are present in the same file, they can be read in separately and assigned to the background via set_bkg('src',get_data('bkg')), so counts from a different file can be assigned as background to the current spectrum.
  • To read in the corresponding ARF, use: load_arf()
    • Q: load_bkg_arf() for the background — should it be done before or after load_bkg(), or does it matter?
    • A: does not matter
  • To read in the corresponding RMF, use: load_rmf()
    • Q: load_bkg_rmf() for the background, and same question as above
    • A: same answer as above; does not matter.
  • To see the structure of the data, type: print(get_data()) and print(get_bkg())
  • To select a subset of channels to analyze, use: notice_id()
  • To subtract background from source data, use: subtract()
  • To not subtract, to undo the subtraction, etc., use: unsubtract()
  • To plot useful stuff, use: plot_data(), plot_bkg(), plot_arf(), plot_model(), plot_fit(), etc.
  • (Q: how in god’s name does one avoid plotting those damned error bars? I know error bars are necessary, but when I have a spectrum with 8192 bins, I don’t want it washed out with silly 1-sigma Poisson bars. And while we are asking questions, how do I change the units on the y-axis to counts/bin? A: rumors say that plot_data(1,yerr=0) should do the trick, but it appears to be still in the development version.)


  • To fit model to the data, command it to: fit()
  • To get error bars on the fit parameters, use: projection() (or covar(), but why deliberately use a function that is guaranteed to underestimate your error bars?)
  • Defining models appears to be much easier now. You can use syntax like: set_source(ModelName.ModelID+AnotherModel.ModelID2) (where you can distinguish between different instances of the same type of model using the ModelID — e.g., set_source(xsphabs.abs1*powlaw1d.psrc+powlaw1d.pbkg))
  • To see what the model parameter values are, type: print(get_model())
  • To change statistic, use: set_stat() (options are various chisq types, cstat, and cash)
  • To change the optimization method, use: set_method() (options are levmar, moncar, neldermead, simann, simplex)


]]> 2