The AstroStat Slog » R

[Books] Bayesian Computations

hlee — Fri, 11 Sep 2009 20:40:23 +0000

A number of practical Bayesian data analysis books are available these days. Here, I’d like to introduce two that were relatively recently published. I like the fact that they are rather technical than theoretical. They have practical examples close to be related with astronomical data. They have R codes so that one can try algorithms on the fly instead of jamming probability theories.

Bayesian Computation with R
Author:Jim Albert
Publisher: Springer (2007)

As the title said, accompanying R package LearnBayes is available (clicking the name will direct you for downloading the package). Furthermore, the last chapter is about WinBUGS. (Please, check out resources listed in BUGS for other BUGS, Bayesian inference Using Gibbs Sampling) Overall, it is quite practical and instructional. If an young astronomer likes to enter the competition posted below because of sophisticated data requiring non traditional statistical modeling, this book can be a good starting. (Here, traditional methods include brute force Monte Carlo simulations, chi^2/weighted least square fitting, and test statistics with rigid underlying assumptions).

An interesting quote is filtered because of a comment from an astronomer, “Bayesian is robust but frequentist is not” that I couldn’t agree with at the instance.

A Bayesian analysis is said to be robust to the choice of prior if the inference is insensitive to different priors that match the user’s beliefs.

Since there’s no discussion of priors in frequentist methods, Bayesian robustness cannot be matched and compared with frequentist’s robustness. Similar to my discussion in Robust Statistics, I kept the notion that robust statistics is insensitive to outliers or iid Gaussian model assumption. Particularly, the latter is almost ways assumed in astronomical data analysis, unless other models and probability densities are explicitly stated, like Poisson counts and Pareto distribution. New Bayesian algorithms are invented to achieve robustness, not limited to the choice of prior but covering the topics from frequentists’ robust statistics.

The introduction of Bayesian computation focuses on analytical and simple parametric models and well known probability densities. These models and their Bayesian analysis produce interpretable results. Gibbs sampler, Metropolis-Hasting algorithms, and their few hybrids could handle scientific problems as long as scientific models and the uncertainties both in observations and parameters transcribed into well known probability density functions. I think astronomers like to check Chap 6 (MCMC) and Chap 9 (Regression Models). Often times, in order to prove strong correlation between two variables, astronomers adopt simple linear regression models and fit the data to these models. A priori knowledge enhances the flexibility of fitting analysis in which Bayesian computation works robustly different from straightforward chi-square methods. The book does not have sophisticated algorithms nor theories. It only offers very necessities and foundations for Bayesian computations to be accommodated into scientific needs.

The other book is

Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Author: J. Marin and C.P.Robert
Publisher: Springer (2007).

Although the book is written by statisticians, the very first real data example is CMBdata (cosmic microwave background data; instead of cosmic, the book used cosmological. I’m not sure which one is correct but I’m so used to CMB by cosmic microwave background). Surprisingly, CMB became a very easy topic in statistics in terms of testing normality and extreme values. Seeing the real astronomy data first from the book was the primary reason of introducing this book. Also, it’s a relatively small volume book (about 250 pages) compared other Bayesian textbooks with the broad coverage of topics in Bayesian computation. There are other practical real data sets to illustrate Bayesian computations in the book and these example data sets are found from the book website

The book begins with R, then normal models, regression and variable selection, generalized linear models, capture-recapture experiments, mixture models, dynamic models, and image analysis are covered.

I feel exuberant when I found the book describes the law of large numbers (LLN) that justifies the Monte Carlo methods. The LLN appears often when integration is approximated by summation, which astronomers use a lot without referring the name of this law. For more information, I rather give a wikipedia link to Law of Large Numbers.

Several MCMC algorithms can be mixed together within a single algorithm using either a circular or a random design. While this construction is often suboptimal (in that the inefficient algorithms in the mixture are still used on a regular basis), it almost always brings an improvement compared with its individual components. A special case where a mixed scenario is used is the Metropolis-within-Gibbs algorithm: When building a Gibbs sample, it may happen that it is difficult or impossible to simulate from some of the conditional distributions. In that case, a single Metropolis step associated with this conditional distribution (as its target) can be used instead.

Description in Sec. 4.2 Metropolis-Hasting Algorithms is expected to be more appreciated and comprehended by astronomers because of the historical origins of these topics, detailed balance equation and random walk.

Personal favorite is section 6 on mixture models. Astronomers handle data of multi populations (multiple epochs of star formations, single or multiple break power laws, linear or quadratic models, metalicities from merging or formation triggers, backgrounds+sources, environment dependent point spread functions, and so on) and discusses the difficulties of label switching problems (identifiability issue in codifying data into MCMC or EM algorithm)

A completely different approach to the interpretation and estimation of mixtures is the semiparametric perspective. To summarize this approach, consider that since very few phenomena obey probability laws corresponding to the most standard distributions, mixtures such as (*) can be seen as a good trade-off between fair represntation of the phenomenon and efficient estimation of the underlying distribution. If k is large enough, there is theoretical support for the argument that (*) provides a good approximation (in some functional sense) to most distributions. Hence, a mixture distribution can be perceived as a type of basis approximation of unknown distributions, in a spirit similar to wavelets and splines, but with a more intuitive flavor (for a statistician at least). This chapter mostly focuses on the “parametric” case, when the partition of the sample into subsamples with different distributions f_j does make sense form the dataset point view (even though the computational processing is the same in both cases).

We must point at this stage that mixture modeling is often used in image smoothing but not in feature recognition, which requires spatial coherence and thus more complicated models…

My patience ran out to comprehend every detail of the book but the section of reversible jump MCMC, hidden Markov model (HMM), and Markov random fields (MRF) would be very useful. Particularly, these topics appear often in image processing, which field astronomers have their own algorithms. Adaption and comparison across image analysis methods promises new directions of scientific imaging data analysis beyond subjective denoising, smoothing, and segmentation.

Readers considering more advanced Bayesian computation and rigorous treatment of MCMC methodology, I’d like to point a textbook, frequently mentioned by Marin and Robert.

Monte Carlo Statistical Methods Robert, C. and Casella, G. (2004)
Springer-Verlag, New York, 2nd Ed.

There are a few more practical and introductory Bayesian Analysis books recently published or soon to be published. Some readership would prefer these books of running ink. Perhaps, there is/will be Bayesian Computation with Python, IDL, Matlab, Java, or C/C++ for those who never intend to use R. By the way, for Mathematica users, you would like to check out Phil Gregory’s book which I introduced in [books] a boring title. My point is that applied statistics has become more friendly to non statisticians through these good introductory books and free online materials. I hope more astronomers apply statistical models in their data analysis without much trouble in executing Bayesian methods. Some might want to check BUGS, introduced [BUGS]. This posting contains resources of how to use BUGS and available packages under languages.

Where is ciao X ?

hlee — Thu, 30 Jul 2009 06:57:00 +0000

X={ primer, tutorial, cookbook, Introduction, guidebook, 101, for dummies, …}

I’ve heard many times about the lack of documentation of this extensive data analysis system, ciao. I saw people still using ciao 3.4 although the new version 4 has been available for many months. Although ciao is not the only tool for Chandra data analysis, it was specifically designed for it. Therefore, I expect it being used frequently with popularity. But the reality is against my expectation. Whatever (fierce) discussion I’ve heard, it has been irrelevant to me because ciao is not intended for statistical analysis. Then, out of sudden, after many months, a realization hit me. ciao is different from other data analysis systems and softwares. This difference has been a hampering factor of introducing ciao outside the Chandra scientist community and of gaining popularity. This difference was the reason I often got lost in finding suitable documentations.

http://cxc.harvard.edu/ciao/ is the website to refer when you start using ciao and manuals are listed here, manuals and memos. The aforementioned difference is that I’m used to see Introduction, Primer, Tutorial, Guide for Beginners at the front page or the manual websites but not from the ciao websites. From these introductory documentations, I can stretch out to other specific topics, modules, tool boxes, packages, libraries, plug-ins, add-ons, applications, etc. Tutorials are the inertia of learning and utilizing data analysis systems. However, the layout of ciao manual websites seems not intended for beginners. It was hard to find basics when some specific tasks with ciao and its tools got stuck. It might be useful only for Chandra scientists who have been using ciao for a long time as references but not beyond. It could be handy for experts instructing novices by working side by side so that they can give better hands-on instruction.

I’ll contrast with other popular data analysis systems and software.

When I began to use R, I started with R manual page containing this pdf file, Introduction to R. Based on this introductory documentations, I could learn specific task oriented packages easily and could build more my own data analysis tools.
When I began to use Matlab, I was told to get the Matlab primer. Although the current edition is commercial, there are free copies of old editions are available via search engines or course websites. There other tutorials are available as well. After crashing basics of Matlab, it was not difficult to getting right tool boxes for topic specific data analysis and scripting for particular needs.
When I began to use SAS (Statistical Analysis System), people in the business said get the little SAS book which gives the basis of this gigantic system, from which I was able to expend its usage for particular statistical projects.
Recently, I began to learn Python to use many astronomical and statistical data analysis modules developed by various scientists. Python has its tutorials where I can point for basic to fully utilize those task specific modules and my own scripting.
Commericial softwares often come with their own beginners’ guide and demos that a user can follow easily. By acquiring basics from these tutorials, expending applications can be well directed. On the other hands, non-commercial softwares may be lack of extensive but centralized tutorials unlike python and R. Nonetheless, acquiring tutorials for teaching is easy and these unlicensed materials are very handy whenever problems are confronted under various but task specific projects.
I used to have IDL tutorials on which I relied a lot to use some astronomy user libraries and CHIANTI (atomic database). I guess the resources of tutorials have changed dramatically since then.

Even if I’ve been navigating the ciao website and its threads high in volume so many times, I only come to realize now that there’s no beginner’s guide to be called as ciao cookbook, ciao tutorial, ciao primer, ciao primer, ciao for dummies, or introduction to ciao at the visible location.

This is a cultural difference. Personal thought is that this tradition prevents none Chandra scientists from using data in the Chandra archive. A good news is that there has been ciao workshops and materials from the workshops are still available. I believe compiling these materials in a fashion that other beginners’ guides introducing the data analysis system can be a good starting point for writing up a front-page worthy tutorial. The existence of this introductory material could embrace more people to use and to explore Chandra X-ray data. I hope these tutorials from other softwares and data analysis systems (primer, cookbook, introduction, tutorial, or ciao for dummies) can be good guide lines to fully compose a ciao primer.

read.table()

hlee — Mon, 27 Oct 2008 15:05:27 +0000

The first step of data analysis or applications is reading the data sets into a tool of choice. Recent years, I’ve been using R (see also Learning R) for that regard but I’ve enjoyed freedoms for the same purpose from these languages and tools: BASIC, fortran77/90/95, C/C++, IDL, IRAF, AIPS, mongo/supermongo, MATLAB, Maple, Mathematica, SAS, SPSS, Gauss, ARC, Minitab, and recently Python and ciao which I just began to learn. Many of them I lost the fluency of how to use it. Quick learning tends to be flash memory. Some will need brain defragmentation and recovering time for extensive scientific work. A few I don’t like to use at all. No matter what, I’m not a computer geek. I’m not good at new gadgets, new softwares, nor welcome new and allegedly versatile computing systems. But one must be if he/she want to handle data. Until recently I believed R has such versatility in the aspect of reading in data. Yet, there is nothing without exceptions.

From time to time, I talked about among many factors, FITS format data make it difficult statisticians and astronomers work together. Statisticians cannot read in FITS format unless astronomers convert it into ascii or jpeg format for them whereas astronomers do not want to wasted their busy time for doing a chore like file format conversion wasting computer resources as well. Only a peaceful reunion happens when the data analysis become intractable via traditional methodology described in Numerical Recipes or Bevington and Robinson. They realize statistical (new) theory need to be found and collaboration happens with involvement of graduate students from both fields who patiently do many tedious jobs while learning (I missed this part while I was graduate student, which sometimes I thank my advisor for).

Now, let’s get back to the title. read.table()^[1] is a commonly used command line in R when you read in data in ascii format. It’ reads in data intelligently. As I said, it has been versatile enough. Numerals are in numeric format, letters are character format, missings are stored as NA, etc. read.table() make it easy to jump into data analysis right away. Well, now you know why I write this. I confronted a case read.table() does not read things correctly with astronomical data “even in ascii format.,” which I never had since I began to use S-Plus/R.

Although I know how to fix this simple problem that I’ll describe later, I want to point out the lack of compatibility in data formats between two communities and the common tools used for accessing data sets, which, I believe, is one of the biggest factors that prohibit astronomically uneducated statisticians from participating collaborations. I’ve mixed up tools for consulting courses to assist clients of various disciplines (grad students from agriculture, horticulture, physiology, social science, psychology were my clients) and for executing projects in electrical engineering and computational physics (these heavily rely on MATLAB) but reading data was the most simplest and fundamental step that I don’t have to worry about across various data sets with R (probably, those graduate students and professors of engineering and physics provided well trimmed and proven data sets).

When you have a long way to complete your mission and when you stumbled with your first step, I think it’s easy to loose eagerness for the future unless there’s support from your colleagues. Instead, I mostly likely receive discouraging comments such as “Why using R?” “You won’t have such problems if you use other tools” (Although it takes a bit of extra time to manuever, I eventually get to there). Such frustrating comments also degrade eagerness furthermore. So, from 100% I normally begin with, only 25% eagerness is left after two discouraging moments occurred at the initial step of data analysis whose end is invisibly far away. I only hang on to this 25%, still big by the normal standard and I wish for this last long until the final step without exponential decays that happened at the beginning.

Ah, the example, I promised. Click here for one example (from XAtlas) and check if read.table() can do the job in an one shot when the 3rd column is your x and the 4th column is your y. It’ll produce a beautiful spectrum if the data points are read in properly as numerals. My trick was using awk to extract those two columns because of unequal row entries in columns and read that into R. Such two steps work unfortunately made read.table() of R recognized entries as categorical data. To remove the episode of R recognizing entries as categorical data, between two steps, you must to fix the cause that read.table() reads what looks like numerals into categorical. If you investigate the data set files carefully you’ll find why; however, it’s a bit of tedious job when one have thousand entries in each data file and there are numerous data files. Without information, this effort will be same as writing a line of scanf()/READ in C/Fortran by counting column by column to type correct floating point format. This manifest the differences of formatting tables between astronomers and statisticians including scientists from ecometrics, econometrics, psycometrics, biometrics, bioinformatics, and others that include statistics related suffix.

Except such artifact (or cultural difference), XAtlas is a great catalog for statisticians in functional data analysis, who look for examples to deal with non smooth curves. New strategies and statistical applications will help astronomers see such unprecedented data sets better. Perhaps, actually more certainty, your 25% will grow back to 100% once you see those spectra and other metrics on your own plotting windows.

click here for the explanation of the read.table() function and
click here for the reason why is read.table() so inefficient?

Classification and Clustering

hlee — Thu, 18 Sep 2008 23:48:43 +0000

Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms.

Simply put, classification is regression problem and clustering is mixture problem with unknown components. Defining a classifier, a regression model, is the objective of classification and determining the number of clusters is the objective of clustering. In classification, predefined classes exist such as galaxy types and star types and one wishes to know what prediction variables and their functional allow to separate Quasars from stars without individual spectroscopic observations by only relying on handful variables from photometric data. In clustering analysis, there is no predefined class but some plots visualize multiple populations and one wishes to determine the number of clusters mathematically not to be subjective in concluding remarks saying that the plot shows two clusters after some subjective data cleaning. A good example is that as photons from Gamma ray bursts accumulate, extracting features like F_{90} and F_{50} enables scatter plots of many GRBs, which eventually led people believe there are multiple populations in GRBs. Clustering algorithms back the hypothesis in a more objective manner opposed to the subjective manner of scatter plots with non statistical outlier elimination.

However, there are challenges to make a clear cut between classification and clustering both in statistics and astronomy. In statistics, missing data is the phrase people use to describe this challenge. Fortunately, there is a field called semi-supervised learning to tackle it. (Supervised learning is equivalent to classification and unsupervised learning is to clustering.) Semi-supervised learning algorithms are applicable to data, a portion of which has known class types and the rest are missing — astronomical catalogs with unidentified objects are a good candidate for applying semi-supervised learning algorithms.

From the astronomy side, the fact that classes are not well defined or subjective is the main cause of this confusion in classification and clustering and also the origin of this challenge. For example, will astronomer A and B produce same results in classifying galaxies according to Hubble’s tuning fork?^[1] We are not testing individual cognitive skills. Is there a consensus to make a cut between F9 stars and G0 stars? What make F9.5 star instead of G0? With the presence of error bars, how one is sure that the star is F9, not G0? I don’t see any decision theoretic explanation in survey papers when those stellar spectral classes are presented. Classification is generally for data with categorical responses but astronomer tend to make something used to be categorical to continuous and still remain to apply the same old classification algorithms designed for categorical responses.

From a clustering analysis perspective, this challenge is caused by outliers, or peculiar objects that do not belong to the majority. The size of this peculiar objects can make up a new class that is unprecedented before. Or the number is so small that a strong belief prevails to discard these data points, regarded as observational mistakes. How much we can trim the data with unavoidable and uncontrollable contamination (remember, we cannot control astronomical data as opposed to earthly kinds)? What is the primary cause defining the number of clusters? physics, statistics, astronomers’ experience in processing and cleaning data, …

Once the ambiguity in classification, clustering, and the complexity of data sets is resolved, another challenge is still waiting. Which black box? For the most of classification algorithms, Pattern Recognition and Machine Learning by C. Bishop would offer a broad spectrum of black boxes. Yet, the book does not include various clustering algorithms that statisticians have developed in addition to outlier detection. To become more rigorous in selecting a black box for clustering analysis and outlier detection, one is recommended to check,

Clustering Analysis by Everitt, Landau, and Leese
Data Clustering: Theory, Algorithms, and Applications by Gan, Ma, and Wu
collection of articles and presentation files on nonparametric multivariate analysis by Robert Serfling (Yes, the author of the classical book, Approximation Theorems of Mathematical Statistics), particularly about data depth and outlier detection and
robust statistics by Peter Huber

For me, astronomers tend to be in a haste owing to the pressure of publishing results immediately after data release and to overlook suitable methodologies for their survey data. It seems that there is no time for consulting machine learning specialists to verify the approaches they adopted. My personal prayer is that this haste should not be settled as a trend in astronomical survey and large data analysis.

Check out the project, GALAXY ZOO

BUGS

hlee — Tue, 16 Sep 2008 20:34:23 +0000

Astronomers tend to think in Bayesian way, but their Bayesian implementation is very limited. OpenBUGS, WinBUGS, GeoBUGS (BUGS for geostatistics; for example, modeling spatial distribution), R2WinBUGS (R BUGS wrapper) or PyBUGS (Python BUGS wrapper) could boost their Bayesian eagerness. Oh, by the way, BUGS stands for Bayesian inference Using Gibbs Sampling.

Disclaimer: I never did serious Bayesian computations so that information I provide here tends to be very shallow. Both statisticians and astronomers oriented by Bayesian ideals are very welcome to add advanced pieces of information.

Bayesian statistics is very much preferred in astronomy, at least here at Harvard Smithsonian Center for Astrophysics. Yet, I do not understand why astronomy data analysis packages do not include libraries, modules, or toolboxes for MCMC (porting scripts from Numerical Recipes or IMSL, or using Python does not count here since these are also used by engineers and scientists of other disciplines: my view is also restricted thanks to my limited experience in using astronomical data analysis packages like ciao, XSPEC, IDL, IRAF, and AIPS) similar to WinBUGS or OpenBUGS. Most of Bayesian analysis in astronomy has to be done from the scratch, which drives off simple minded people like me (I prefer analytic forms and estimators than posterior chains). I hope easily implementable Bayesian Data Analysis modules come along soon to current astronomical data analysis systems for any astronomers who only had a lecture about Bayes theorem and Gibbs sampling. Perhaps, BUGS can be a role model to develop such modules.

As listed, one does not need R to use BUGS. WinBUGS is both stand alone and R implementable. PyBUGS can be handy since python is popular among astronomers. I heard that MATLAB (its open source counterpart, OCTAVE) has its own tools to maneuver Bayesian Data Analysis relatively easily. There are many small MCMC modules to solve particular problems in astronomy but none of them are reported to be robust enough so as to be applied in other type data sets. Not many have the freedom of choosing models and priors.

Hopefully, well knowledged Bayesians contribute in developing modules for Bayesian data analysis in astronomy. I don’t like to see contour plots, obtained from brute-forceful and blinded χ² fitting, claimed to be bivariate probability density profiles. I’d like to project the module development like the way that BUGS is developed in astronomical data analysis packages with various Bayesian libraries. Here are some web links about BUGS:
The BUGS Project
WinBUGS
OpenBUGS
Calling WinBUGS 1.4 from other programs

[ArXiv] 2nd week, June 2008

hlee — Mon, 16 Jun 2008 14:47:42 +0000

As Prof. Speed said, PCA is prevalent in astronomy, particularly this week. Furthermore, a paper explicitly discusses R, a popular statistics package.

[astro-ph:0806.1140] N.Bonhomme, H.M.Courtois, R.B.Tully
Derivation of Distances with the Tully-Fisher Relation: The Antlia Cluster
(Tully Fisher relation is well known and one of many occasions statistics could help. On the contrary, astronomical biases as well as measurement errors hinder from the collaboration).
[astro-ph:0806.1222] S. Dye
Star formation histories from multi-band photometry: A new approach (Bayesian evidence)
[astro-ph:0806.1232] M. Cara and M. Lister
Avoiding spurious breaks in binned luminosity functions
(I think that binning is not always necessary and overdosed, while there are alternatives.)
[astro-ph:0806.1326] J.C. Ramirez Velez, A. Lopez Ariste and M. Semel
Strength distribution of solar magnetic fields in photospheric quiet Sun regions (PCA was utilized)
[astro-ph:0806.1487] M.D.Schneider et al.
Simulations and cosmological inference: A statistical model for power spectra means and covariances
(They used R and its package Latin hypercube samples, lhs.)
[astro-ph:0806.1558] Ivan L. Andronov et al.
Idling Magnetic White Dwarf in the Synchronizing Polar BY Cam. The Noah-2 Project (PCA is applied)
[astro-ph:0806.1880] R. G. Arendt et al.
Comparison of 3.6 – 8.0 Micron Spitzer/IRAC Galactic Center Survey Point Sources with Chandra X-Ray Point Sources in the Central 40×40 Parsecs (K-S test)

R-[{Perl,Python}] Interface

hlee — Tue, 13 May 2008 19:47:49 +0000

The brackets could be filled with other languages but two are introduced today: Perl (perl.org) and Python (python.org). These two are widely used among astronomers and can be empowered by R (r-project.org).

R/SPlus – Perl Interface
R/SPlus – Python Interface

Posts on R and Python from the slog
Learning R
Learning Python

Learning R

hlee — Mon, 29 Jan 2007 15:48:07 +0000

R is a programming language and software for statistical computing and graphics. It is the most popular tool for statisticians and a widely used software for statistical data analysis thanks to the fact that its source code is freely available and it is fairly easy to access from installation to theoretical application.

Most of information about R can be found at R Project including the software itself and many add-on packages. These individually contributed packages serve particular statistical interests of their users. The documentation menu on the website and each packages contain extensive documentations of how-to’s. Some large packages include demos so that following the scripts in a demo makes R learning easy.

For astronomers, the R tutorial from Penn State Summer School for Astronomers will be useful. This tutorial illustrates R with astronomical data sets. Copy-and-pasting command lines will be a good starting point until data structures and programming logics become internalized. R is a fairly simple language to learn if one has a little experience in other programming languages.

A good online tutorial, providing an overview of R, is found from this link. Many user’s interest dependent tutorials available on line. Here are sample images from Taeyoung’s Tutorial (click for pdf).

Among many available textbooks, the followings provide general R usage. More books are available for specific needs.

Modern Applied Statistics with S, 4th ed. by Venables and Ripley (R was developed from S, The book website provides exercise problems and data sets).
Introductory Statistics with R, by Peter Daalgard.
Data Analysis and Graphics Using R: An Example-based Approach, by J. Maindonald and J. Braun

Also, Introduction to the R Project for Statistical Computing for use at ITC (click for pdf) by D.G. Rossiter provides a short but extensive overview of R.Unfortunately, FITS reader is not available in R. We hope a skillful astronomer to contribute a FITS reader among other packages.

AstroStatistics Summer School at PSU

hlee — Mon, 29 Jan 2007 04:34:20 +0000

Since Summer 2005, G. Jogesh Babu (Statistics) and Eric Feigelson (Astronomy) have organized lectures and lab sessions on statistics for astronomers and physicists. Lecturers are professors from Penn State statistics department and invited renown scientists from different countries. Students show diverse demography as well. Within a week or so, students listen Statistics 101 to recently published statistical theories particularly applied to astronomical data. They also learn how to use R, a statistical software and script language to perform statistics they learn through lectures. Past two years, this summer school proved its uniqueness and usefulness. More information on the upcoming school can be found at http://astrostatistics.psu.edu/su07/index.html and other topics regarding astrostatistics at Center for AstroStatistics at Penn State.