]]>A tremendous amount of information is contained within the temporal variations of various measurable quantities, such as the energy distributions of the incident photons, the overall intensity of the source, and the spatial coherence of the variations. While the detection and interpretation of periodic variations is well studied, the same cannot be said for non-periodic behavior in a multi-dimensional domain. Methods to deal with such problems are still primitive, and any attempts at sophisticated analyses are carried out on a case-by-case basis. Some of the issues we seek to focus on are methods to deal with are:

* Stochastic variability

* Chaotic Quasi-periodic variability

* Irregular data gaps/unevenly sampled data

* Multi-dimensional analysis

* Transient classificationOur goal is to present some basic questions that require sophisticated temporal analysis in order for progress to be made. We plan to bring together astronomers and statisticians who are working in many different subfields so that an exchange of ideas can occur to motivate the development of sophisticated and generally applicable algorithms to astronomical time series data. We will review the problems and issues with current methodology from an algorithmic and statistical perspective and then look for improvements or for new methods and techniques.

]]>The Future of Scientific Knowledge Discovery in Open Networked Environments

http://sites.nationalacademies.org/PGA/brdi/PGA_060422New York Workshop on Computer, Earth, and Space Sciences 2011

http://www.giss.nasa.gov/meetings/cess2011/Innovations in Data-Intensive Astronomy

http://www.nrao.edu/meetings/bigdata/Astrostatistics and Data Mining in Large Astronomical Databases

http://www.iwinac.uned.es/Astrostatistics/Statistical Challenges in Modern Astronomy V (including summer school & tutorials)

http://astrostatistics.psu.edu/su11scma5/Very Wide Field Surveys in the Light of Astro2010

http://widefield2011.pha.jhu.edu/Statistical Methods for Very Large Datasets

http://www.regonline.com/builder/site/Default.aspx?eventid=75763323rd Scientific and Statistical Database Management Conference

http://ssdbm2011.ssdbm.org/International Statistical Institute (ISI) World Congress

http://www.isi2011.ie/NASA Conference on Intelligent Data Understanding

https://c3.ndc.nasa.gov/dashlink/projects/43/

- Summer School in Statistics for Astronomers VII (June 6-10, 2011)
- Pre-conference Tutorials (June 11-12, 2011)
- Statistical Challenges in Modern Astronomy V (June 13-17, 2011)
***

**Web site: **http://astrostatistics.psu.edu/su11scma5/*

Registration is now open until May 6

(Summer School registration may close earlier if the enrollment limit is reached)

*Contributed papers for the SCMA V conference are welcome*

**Summer School in Statistics for Astronomers**: The seventh summer school is an intensive week covering basic statistical inference, several fields of applied statistics, and hands-on experience with the R computing environment. Topics include: exploratory data analysis, hypothesis testing, parameter estimation, regression, bootstrap resampling, model selection & goodness-of-fit, maximum likelihood and Bayesian methods, nonparametrics, spatial processes, and times series. Instructors are mostly faculty members in statistics.

** Pre-conference tutorials**: Instruction in four areas of astrostatistical interest presented during the weekend between the Summer School and SCMA V conference. Topics are: Bayesian computation and MCMC; data mining; R for astronomers; and wavelets for image analysis. Instructors are members of the SCMA V Scientific Organizing Committee.

**SCMA V conference**: Held every five years, SCMA conferences are the premier cross-disciplinary forum for research statisticians and astronomers to discuss methodological issues of mutual interest. Session topics include: statistical modeling in astronomy, Bayesian analysis across astronomy; Bayesian cosmology; data mining and informatics; sparsity; interpreting astrophysical simulations; time domain astronomy; spatial and image analysis; and future directions for astrostatistics. Invited lectures will be followed by cross-disciplinary commentaries. The conference welcomes contributed papers from statisticians and astronomers.

* Visit **http://astrostatistics.psu.edu/su11scma5/** for more information and registration*

Contacts:

Eric Feigelson, Dept. of Astronomy & Astrophysics, Penn State, edf@astro.psu.edu

G. Jogesh Babu, Dept. of Statistics, Penn State, babu@stat.psu.edu

He also has the answer worked out in detail.

(h/t Doug Burke)

]]>]]>Instead of “confidence interval,” let’s say “uncertainty interval”

]]>This will be one of the better years for Perseids; the moon, which often interferes with the Perseids, will not be a problem this year. So I’m putting together something that’s never been done before: a spatial analysis of the Perseid meteor stream. We’ve had plenty of temporal analyses, but nobody has ever been able to get data over a wide area — because observations have always been localized to single observers. But what if we had hundreds or thousands of people all over North America and Europe observing Perseids and somebody collected and collated all their observations? This is crowd-sourcing applied to meteor astronomy. I’ve been working for some time on putting together just such a scheme. I’ve got a cute little Java applet that you can use on your laptop to record the times of fall of meteors you see, the spherical trig for analyzing the geometry (oh my aching head!) and a statistical scheme that I *think* will reveal the spatial patterns we’re most likely to see — IF such patterns exist. I’ve also got some web pages describing the whole shebang. They start here:

http://www.erasmatazz.com/page78/page128/PerseidProject/PerseidProject.html

I think I’ve gotten all the technical, scientific, and mathematical problems solved, but there remains the big one: publicizing it. It won’t work unless I get hundreds of observers. That’s where you come in. I’m asking two things of you:

1. Any advice, criticism, or commentary on the project as presented in the web pages.

2. Publicizing it. If we can get that ol’ Web Magic going, we could get thousands of observers and end up with something truly remarkable. So, would you be willing to blog about this project on your blog?

3. I would be especially interested in your comments on the statistical technique I propose to use in analyzing the data. It is sketched out on the website here:http://www.erasmatazz.com/page78/page128/PerseidProject/Statistics/Statistics.html

Given my primitive understanding of statistical analysis, I expect that your comments will be devastating, but if you’re willing to take the time to write them up, I’m certainly willing to grit my teeth and try hard to understand and implement them.

Thanks for any help you can find time to offer.

Chris Crawford

I’ve noticed that there are rapidly growing interests and attentions in data mining and machine learning among astronomers but the level of execution is yet rudimentary or partial because there has been no comprehensive tutorial style literature or book for them. I recently introduced a machine learning book written by an engineer. Although it’s a very good book, it didn’t convey the foundation of machine learning built by statisticians. In the quest of searching another good book so as to satisfy the astronomers’ pursuit of (machine) learning methodology with the proper amount of statistical theories, the first great book came along is **The Elements of Statistical Learning**. It was chosen for this writing not only because of its fame and its famous authors (Hastie, Tibshirani, and Friedman) but because of my personal story. In addition, the 2nd edition, which contains most up-to-date and state-of-the-art information, was released recently.

First, the book website:

The Elements of Statistical Learning byHastie, Tibshirani, and Friedman

You’ll find examples, R codes, relevant publications, and plots used in the text books.

Second, I want to tell how I learned about this book before its first edition was published. Everyone has a small moment of meeting very famous people. Mine is shaking hands with President Clinton, in 2000. I still remember the moment vividly because I really wanted to tell him that ice cream was dripping on his nice suit but the top of the line guards blocked my attempt of speaking/pointing icecream dripping with a finger afterward the hand shaking. No matter what context is, shaking hands with one of the greatest presidents is a memorable thing. Yet it was not my cherishing moment because of icecreaming dripping and scary bodyguards. My most cherishing moment of meeting famous people is the half an hour conversation with late Prof. Leo Breinman (click for my two postings about him), author of probability textbook, creator of CART, and the most forefront pioneer in machine learning.

The conclusion of that conversation was a book soon to be published after explaining him my ideas of applying statistics to astronomical data and his advices to each problems. I was not capable to understand every statistics so that his answer about this new coming book at that time was the most relevant and apt one.

This conversation happened during the 3rd Statistical Challenges in Modern Astronomy (SCMA). Not long passed since I began my graduate study in statistics but had an opportunity to assist the conference organizer, my advisor Dr. Babu and to do some chores during the conference. By accident, I read the book by Murtagh about multivariate data analysis, so I wanted to speak to him. Except that, I have no desire to speak renown speakers and attendees. Frankly, I didn’t have any idea who’s who at the conference and a few years later, I realized that the conference dragged many famous people and the density of such people was higher than any conference I attended. Who would have imagine that I could have a personal conversation with Prof. Breiman, at that time. I have seen enough that many famous professors train people during conferences. Getting a chance for chatting some seconds are really hard and tall/strong people push someone small like me away always.

The story goes like this: a sunny perfect early summer afternoon, he was taking a break for a cigar and I finished my errands for the session. Not much to do until the end of session, I decided to take some fresh air and I spotted him enjoying his cigar. Only the worst was that I didn’t know he was the person of CART and the founder of statistical machine learning. Only from his talk from the previous session, I learned he was a statistician, who did data mining on galaxies. So, I asked him if I can join him and ask some questions related to some ideas that I have. One topic I wanted to talk about classification of SN light curves, by that time from astronomical text books, there are Type I & II, and Type I has subcategories, Ia, Ib, and Ic. Later, I heard that there is Type III. But the challenge is observations didn’t happen with equal intervals. There were more data mining topics and the conversation went a while. In the end, he recommended me a book which will be published soon.

Having such a story, a privilege of talking to late Prof. Breiman through an very unique meeting, SCMA, before knowing the fame of the book, this book became one of my favorites. The book, indeed, become popular, around that time, almost only book discussing statistical learning; therefore, it was an excellent textbook for introducing statistics to engineerers and machine learning to statisticians. In the mean time, statistical learning enjoyed popularity in many disciplines that have data sets and urging for learning with the aid of machine. Now books and journals on machine learning, data mining, and knowledge discovery (KDD) became prosperous. I was so delighted to see the 2nd edition in the market to bridge the gap over the years.

I thank him for sharing his cigar time, probably his short free but precious time for contemplation, with me. I thank his patience of spending time with such an ignorant girl with a foreign english accent. And I thank him for introducing a book which will became a bible in the statistical learning community within a couple of years (I felt proud of myself that I access the book before people know about it). Perhaps, astronomers cannot have many joys from this book that I experienced from how I encounter the book, who introduced the book, whether the book was used in a course, how often book is referred, etc. But I assure that it’ll narrow the gap in the notions how astronomers think about data mining (preprocessing, pipelining, and bulding catalogs) and how statisticians treat data mining. The newly released 2nd edition would help narrowing the gap further and assist astronomers to coin brilliant learning algorithms specific for astronomical data. [The END]

—————————– Here, I patch my scribbles about the book.

What distinguish this book from other machine learning books is that not only authors are big figures in statistics but also fundamentals of statistics and probability are discussed in all chapters. Most of machine learning books only introduce elementary statistics and probability in chapter 2, and no basics in statistics is discussed in later chapters. Generally, empirical procedures, computer algorithms, and their results without presenting basic theories in statistics are presented.

You might want to check the book’s website for data sets if you want to try some ideas described there

The Elements of Statistical Learning

In addition to its historical footprint in the field of statistical learning, I’m sure that some astronomers want to check out topics in the book. It’ll help to replace some data analysis methods in astronomy celebrating their centennials sooner or later with state of the art methods to cope with modern data.

This new edition reflects some evolutions in statistical learning whereas the first edition has been an excellent harbinger of the field. Pages quoted from the 2nd edition.

[p.28] Suppose in fact that our data arose from a statistical model $Y=f(X)+e$ where the random error e has E(e)=0 and is independent of X. Note that for this model, f(x)=E(Y|X=x) and in fact the conditional distribution Pr(Y|X) depends on X

onlythrough the conditional mean f(x).

The additive error model is a useful approximation to the truth. For most systems the input-output pairs (X,Y) will not have deterministic relationship Y=f(X). Generally there will be other unmeasured variables that also contribute to Y, including measurement error. The additive model assumes that we can capture all these departures from a deterministic relationship via the error e.

How statisticians envision “model” and “measurement errors” quite different from astronomers’ “model” and “measurement errors” although in terms of “additive error model” they are matching due to the properties of Gaussian/normal distribution. Still, the dilemma of hen or eggs exists prior to any statistical analysis.

[p.30] Although somewhat less glamorous than the learning paradigm, treating supervised learning as a problem in function approximation encourages the geometrical concepts of Euclidean spaces and mathematical concepts of probabilistic inference to be applied to the problem. This is the approach taken in this book.

Strongly recommend to read chapter 3, Linear Methods for Regression: In astronomy, there are so many important coefficients from regression models, from Hubble constant to absorption correction (temperature and magnitude conversion is another example. It seems that these relations can be only explained via OLS (ordinary least square) with the homogeneous error assumption. Yet, books on regressions and linear models are not generally thin. As much diversity exists in datasets, more amount of methodology, theory and assumption exists in order to reflect that diversity. One might like to study the statistical properties of these indicators based on mixture and hierarchical modeling. Some inference, say population proportion can be drawn to verify some hypotheses in cosmology in an indirect way. Understanding what regression analysis and assumptions and how statistician efforts made these methods more robust and interpretable, and reflecting reality would change forcing E(Y|X)=aX+b models onto data showing correlations (not causality).

]]>Phillips Auditorium, CfA,

60 Garden St., Cambridge, MA 02138

URL: http://hea-www.harvard.edu/AstroStat/CAS2010

]]>The California-Boston-Smithsonian Astrostatistics Collaboration plans to host a mini-workshop on Computational Astro-statistics. With the advent of new missions like the Solar Dynamic Observatory (SDO), Panoramic Survey and Rapid Response (Pan-STARRS) and Large Synoptic Survey (LSST), astronomical data collection is fast outpacing our capacity to analyze them. Astrostatistical effort has generally focused on principled analysis of individual observations, on one or a few sources at a time. But the new era of data intensive observational astronomy forces us to consider combining multiple datasets and infer parameters that are common to entire populations. Many astronomers really want to use every data point and even non-detections, but this becomes problematic for many statistical techniques.

The goal of the Workshop is to explore new problems in Astronomical data analysis that arise from data complexity. Our focus is on problems that have generally been considered intractable due to insufficient computational power or inefficient algorithms, but are now becoming tractable. Examples of such problems include: accounting for uncertainties in instrument calibration; classification, regression, and density estimations of massive data sets that may be truncated and contaminated with measurement errors and outliers; and designing statistical emulators to efficiently approximate the output from complex astrophysical computer models and simulations, thus making statistical inference on them tractable. We aim to present some issues to the statisticians and clarify difficulties with the currently used methodologies, e.g. MCMC methods. The Workshop will consist of review talks on current Statistical methods by Statisticians, descriptions of data analysis issues by astronomers, and open discussions between Astronomers and Statisticians. We hope to define a path for development of new algorithms that target specific issues, designed to help with applications to SDO, Pan-STARRS, LSST, and other survey data.

We hope you will be able to attend the workshop and present a brief talk on the scope of the data analysis problem that you confront in your project. The workshop will have presentations in the morning sessions, followed by a discussion session in the afternoons of both days.

**2010-apr-30:** Aneta has setup a blogspot site to deal with simple Sherpa techniques and tactics: http://pysherpa.blogspot.com/

On Help:

- In general, to get help, use:
`ahelp "something"`

(note the quotes) - Even more useful, type:
`? wildcard`

to get a list of all commands that include the`wildcard`

- You can also do a form of autocomplete: type TAB after writing half a command to get a list of all possible completions.

Data I/O:

- To read in your PHA file, use:
`load_pha()`

- Often for Chandra spectra, the background is included in that same file. In any case, to read it in separately, use:
`load_bkg()`

- Q: should it be loaded in to the same dataID as the source?
- A: Yes.
- A: When the background counts are present in the same file, they can be read in separately and assigned to the background via
`set_bkg('src',get_data('bkg'))`

, so counts from a different file can be assigned as background to the current spectrum.

- To read in the corresponding ARF, use:
`load_arf()`

- Q:
`load_bkg_arf()`

for the background — should it be done before or after`load_bkg()`

, or does it matter? - A: does not matter

- Q:
- To read in the corresponding RMF, use:
`load_rmf()`

- Q:
`load_bkg_rmf()`

for the background, and same question as above - A: same answer as above; does not matter.

- Q:
- To see the structure of the data, type:
`print(get_data())`

and`print(get_bkg())`

- To select a subset of channels to analyze, use:
`notice_id()`

- To subtract background from source data, use:
`subtract()`

- To not subtract, to undo the subtraction, etc., use:
`unsubtract()`

- To plot useful stuff, use:
`plot_data(), plot_bkg(), plot_arf(), plot_model(), plot_fit(),`

etc. - (Q: how in god’s name does one avoid plotting those damned error bars? I know error bars are necessary, but when I have a spectrum with 8192 bins, I don’t want it washed out with silly 1-sigma Poisson bars. And while we are asking questions, how do I change the units on the y-axis to counts/bin? A: rumors say that
`plot_data(1,yerr=0)`

should do the trick, but it appears to be still in the development version.)

Fitting:

- To fit model to the data, command it to:
`fit()`

- To get error bars on the fit parameters, use:
`projection()`

(or`covar()`

, but why deliberately use a function that is guaranteed to underestimate your error bars?) - Defining models appears to be much easier now. You can use syntax like:
`set_source(`

(where you can distinguish between different instances of the same type of model using the ModelID — e.g.,*ModelName.ModelID+AnotherModel.ModelID2*)`set_source(`

)*xsphabs.abs1*powlaw1d.psrc+powlaw1d.pbkg*) - To see what the model parameter values are, type:
`print(get_model())`

- To change statistic, use: set_stat() (options are various
`chisq`

types,`cstat,`

and`cash`

) - To change the optimization method, use:
`set_method()`

(options are`levmar, moncar, neldermead, simann, simplex`

)

`v1:2007-dec-18`

`v2:2008-feb-20`

`v3:2010-apr-30`

]]>