The AstroStat Slog » Tutorial http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 data analysis system and its documentation http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/ http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/#comments Fri, 02 Oct 2009 02:11:04 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1977 So far, I didn’t complain much related to my “statistician learning astronomy” experience. Instead, I’ve been trying to emphasize how fascinating it is. I hope that more statisticians can join this adventure when statisticians’ insights are on demand more than ever. However, this positivity seems not working so far. In two years of this slog’s life, there’s no posting by a statistician, except one about BEHR. Statisticians are busy and well distracted by other fields with more tangible data sets. Or compared to other fields, too many obstacles and too high barriers exist in astronomy for statisticians to participate. I’d like to talk about these challenges from my ends.[1]

The biggest challenge for a statistician to use astronomical data is the lack of mercy for nonspecialists’ accessing data including format, quantification, and qualification[2] ; and data analysis systems. IDL is costly although it is used in many disciplines and other tools in astronomy are hardly utilized for different projects.[3] In that regards, I welcome astronomers using python to break such exclusiveness in astronomical data analysis systems.

Even if data and software issues are resolved, there’s another barrier to climb. Validation. If you have a catalog, you’ll see variables of measures, and their errors typically reflecting the size of PSF and its convolution to those metrics. If a model of gaussian assumption applied, in order to tabulate power law index, King’s, Petrosian’s, or de Vaucouleurs’ profile index, and numerous metrics, I often fail to find any validation of gaussian assumptions, gaussian residuals, spectral and profile models, outliers, and optimal binning. Even if a data set is publicly available, I also fail to find how to read in raw data, what factors must be considered, and what can be discarded because of unexpected contamination occurred like cosmic rays and charge over flows. How would I validate raw data that are read into a data analysis system is correctly processed to match values in catalogs? How would I know all entries in catalog are ready for further scientific data analysis? Are those sources real? Is p-value appropriately computed?

I posted an article about Chernoff faces applied to Capella observations from Chandra. Astronomers already processed the raw data and published a catalog of X-ray spectra. Therefore, I believe that the information in the catalog is validated and ready to be used for scientific data analysis. I heard that repeated Capella observation is for the calibration. Generally speaking, in other fields, targets for calibration are almost time invariant and exhibit consistency. If Capella is a same star over the 10 years, the faces in my post should look almost same, within measurement error; but as you saw, it was not consistent at all. Those faces look like observations were made toward different objects. So far I fail to find any validation efforts, explaining why certain ObsIDs of Capella look different than the rest. Are they real Capella? Or can I use this inconsistent facial expression as an evidence that Chandra calibration at that time is inappropriate? Or can I conclude that Capella was a wrong choice for calibration?

Due to the lack of quantification procedure description from the raw data to the catalog, what I decided to do was accessing the raw data and data processing on my own to crosscheck the validity in the catalog entries. The benefit of this effort is that I can easily manipulate data for further statistical inference. Although reading and processing raw data may sound easy, I came across another problem, lack of documentation for nonspecialists to perform the task.

A while ago, I talked about read.table() in R. There are slight different commands and options but without much hurdle, one can read in ascii data in various styles easily with read.table() for exploratory data analysis and confirmatory data analysis with R. From my understanding, statisticians do not spend much time on reading in data nor collecting them. We are interested in methodology to extract information of the population based on sample. While the focus is methodology, all the frustrations with astronomical data analysis softwares occur prior to investigating the best method. The level of frustration reached to the extend of terminating my eagerness for more investigation about inference tools.

In order to assess those Capella observations, thanks to its on-site help, I evoke ciao. Beforehand, I’d like to disclaim that I exemplify ciao to illustrate the culture difference that I experienced as a statistician. It was used to discuss why I think that astronomical data analysis systems are short of documentations and why that astronomical data processing procedures are lack of validation. I must say that I confront very similar problems when I tried to learn astronomical packages such as IRAF and AIPS. Ciao happened to be at handy when writing this post.

In order to understand X-ray data, not only image data files, one also needs effective area (arf), redistribution matrix (rmf), and point spread function (psf). These files are called by calibration data files. If the package was developed for general users, like read.table() I expect there should be a homogenized/centralized data including calibration data reading function with options. Instead, there were various kinds of functions one can use to read in data but the description was not enough to know which one is doing what. What is the functionality of these commands? Which one only stores names of data file? Which one reconfigures the raw data reflecting up to date calibration file? Not knowing complete data structures and classes within ciao, not getting the exact functionality of these data reading functions from ahelp, I was not sure the log likelihood that I computed is appropriate or not.

For example, there are five different ways to associate an arf: read_arf(), load_arf(), set_arf(), get_arf(), and unpack_arf() from ciao. Except unpack_arf(), I couldn’t understand the difference among these functions for accessing an arf[4] Other softwares including XSPEC that I use, in general, have a single function with options to execute different level of reading in data. Ciao has an extensive web documentation without a tutorial (see my post). So I read all ahelp “commands” a few times. But I still couldn’t decide which one to use for my work to read in arfs and rmfs (I happened to have many calibration data files).

arf rmf psf pha data
get get_arf get_rmf get_psf get_pha get_data
set set_arf set_rmf set_psf set_pha set_data
unpack unpack_arf unpack_rmf unpack_psf unpack_pha unpack_data
load load_arf load_rmf load_psf load_pha load_data
read read_arf read_rmf read_psf read_pha read_data

[Note that above links may not work since ciao documentation website evolves quickly. Some might be routed to different links so please, check this website for other data reading commands: cxc.harvard.edu/sherpa/ahelp/index_alphabet.html].

So, I decide to seek for a help through cxc help desk several months back. Their answers are very reliable and prompt. My question was “what are the difference among read_xxx(), load_xxx(), set_xxx(), get_xxx(), and unpack_xxx(), where xxx can be data, arf, rmf, and psf?” The answer to this question was that

You can find detailed explanations for these Sherpa commands in the “ahelp” pages of the Sherpa website:

http://cxc.harvard.edu/sherpa/ahelp/index_alphabet.html

This is a good answer but a big cultural shock to a statistician. It’s like having an answer like “check http://www.r-project.org/search.html and http://cran.r-project.org/doc/FAQ/R-FAQ.html” for IDL users to find out the difference between read.table() and scan(). Probably, for astronomers, all above various data reading commands are self explanatory like R having read.table(), read.csv(), and scan(). Disappointingly, this answer was not I was looking for.

Well, thanks to this embezzlement, hesitation, and some skepticism, I couldn’t move to the next step of implementing fitting methods. At the beginning, I was optimistic when I found out that Ciao 4.0 and up is python compatible. I thought I could do things more in statistically rigorous ways since I can fake spectra to validate my fitting methods. I was thinking about modifying the indispensable chi-square method that is used twice for point estimation and hypothesis testing that introduce bias (a link made to a posting). My goal was make it less biased and robust, less sensitive iid Gaussian residual assumptions. Against my high expectation, I became frustrated at the first step, reading and playing with data to get a better sense and to develop a quick intuition. I couldn’t even make a baby step to my goal. I’m not sure if it a good thing or not, but I haven’t been completely discouraged. Also, time helps gradually to overcome this culture difference, the lack of documentation.

What happens in general is that, if a predecessor says, use “set_arf(),” then the apprentice will use “set_arf()” without doubts. If you begin learning on your own purely relying on documentations, I guess at some point you have to make a choice. One can make a lucky guess and move forward quickly. Sometimes, one can land on miserable situation because one is not sure about his/her choice and one cannot trust the features appeared after these processing. I guess it is natural to feel curiosity about what each of these commands is doing to your data and what information is carried over to the next commands in analysis procedures. It seems righteous to know what command is best for the particular data processing and statistical inference given the data. What I found is that such comparison across commands is missing in documentations. This is why I thought astronomical data analysis systems are short of mercy for nonspecialists.

Another thing I observed is that there seems no documentation nor standard procedure to create the repeatable data analysis results. My observation of astronomers says that with the same raw data, the results by scientist A and B are different (even beyond statistical margins). There are experts and they have knowledge to explain why results are different on the same raw data. However, not every one can have luxury of consulting those few experts. I cannot understand such exclusiveness instead of standardizing the procedures through validations. I even saw that the data that A analyzed some years back can be different from this year’s when he/she writes a new proposal. I think that the time for recreating the data processing and inference procedure to explain/justify/validate the different results or to cover/narrow the gap could have not been wasted if there are standard procedures and its documentation. This is purely a statistician’s thought. As the comment in where is ciao X?[5] not every data analysis system has to have similar design and goals.

Getting lost while figuring out basics (handling, arf, rmf, psf, and numerous case by case corrections) prior to applying any simple statistics has been my biggest obstacle in learning astronomy. The lack of documenting validation methods often brings me frustration. I wonder if there’s any astronomers who lost in learning statistics via R, minitab, SAS, MATLAB, python, etc. As discussed in where is ciao X? I wish there is a centralized tutorial that offers basics, like how to read in data, how to do manipulate datum vector and matrix, how to do arithmetics and error propagation adequately not violating assumptions in statistics (I don’t like the fact that the point estimate of background level is subtracted from observed counts, random variable when the distribution does not permit such scale parameter shifting), how to handle and manipulate fits format files from Chandra for various statistical analysis, how to do basic image analysis, how to do basic spectral analysis, and so on with references[6]

  1. This is quite an overdue posting. Links and associated content can be outdated.
  2. For the classification purpose, data with clear distinction between response and predictor variables so called a training data set must be given. However, I often fail to get processed data sets for statistical analysis. I first spend time to read data and question what is outlier, bias, or garbage. I’m not sure how to clean and extract numbers for statistical analysis and every sub-field in astronomy have their own way to clean to be fed into statistics and scatter plots. For example, image processing is still executed case by case via trained eyes of astronomers. On the other hand, in medical imaging diagnosis specialists offer training sets with which scientists in computer vision develop algorithms for classification. Such collaboration yields accelerated, automatic but preliminary diagnosis tools. A small fraction of results from these preliminary methods still can be ambiguous, i.e. false positive or false negative. Yet, when such ambiguous cancerous cell images at the decision boundaries occur, specialists like trained astronomers scrutinize those images to make a final decision. As medical imaging and its classification algorithms resolve the issue of expert shortage under overflowing images, I wish astronomers adopt their strategies to confront massive streaming images and to assist sparse trained astronomers
  3. Something I like to see is handling background statistically in high energy astrophysics. When simulating a source, background can be simulated as well via Makov Random field, kriging, and other spatial statistics methods. In reality, background is subtracted once in measurement space and the random nature of background is not interactively reflected. Regardless of available statistical methodology to reflect the nature of background, it is difficult to implement it for trial and validation because those tools are not compatible for adding statistical modules and packages.
  4. A Sherpa expert told me there is an FAQ (I failed to locate previously) on this matter. However, from data analysis perspective like a distinction between data.structure, vector, matrix, list and other data types in R, the description is not sufficient for someone who wants to learn ciao and to perform scientific (both deterministic or stochastic) data analysis via scripting i.e. handling objects appropriately. You might want to read comparing commands in Sharpa from Shepa FAQ
  5. I know there is ciaox. Apart from the space between ciao and X, there is another difference that astronomers do not care much compared to statisticians: the difference between X and x. Typically, the capital letter is for random variable and lower case letters for observation or value
  6. By the way, there are ciao workshop materials available that could function as tutorials. Please, locate them if needed.
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/feed/ 0
Where is ciao X ? http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/ http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/#comments Thu, 30 Jul 2009 06:57:00 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3260 X={ primer, tutorial, cookbook, Introduction, guidebook, 101, for dummies, …}

I’ve heard many times about the lack of documentation of this extensive data analysis system, ciao. I saw people still using ciao 3.4 although the new version 4 has been available for many months. Although ciao is not the only tool for Chandra data analysis, it was specifically designed for it. Therefore, I expect it being used frequently with popularity. But the reality is against my expectation. Whatever (fierce) discussion I’ve heard, it has been irrelevant to me because ciao is not intended for statistical analysis. Then, out of sudden, after many months, a realization hit me. ciao is different from other data analysis systems and softwares. This difference has been a hampering factor of introducing ciao outside the Chandra scientist community and of gaining popularity. This difference was the reason I often got lost in finding suitable documentations.

http://cxc.harvard.edu/ciao/ is the website to refer when you start using ciao and manuals are listed here, manuals and memos. The aforementioned difference is that I’m used to see Introduction, Primer, Tutorial, Guide for Beginners at the front page or the manual websites but not from the ciao websites. From these introductory documentations, I can stretch out to other specific topics, modules, tool boxes, packages, libraries, plug-ins, add-ons, applications, etc. Tutorials are the inertia of learning and utilizing data analysis systems. However, the layout of ciao manual websites seems not intended for beginners. It was hard to find basics when some specific tasks with ciao and its tools got stuck. It might be useful only for Chandra scientists who have been using ciao for a long time as references but not beyond. It could be handy for experts instructing novices by working side by side so that they can give better hands-on instruction.

I’ll contrast with other popular data analysis systems and software.

  • When I began to use R, I started with R manual page containing this pdf file, Introduction to R. Based on this introductory documentations, I could learn specific task oriented packages easily and could build more my own data analysis tools.
  • When I began to use Matlab, I was told to get the Matlab primer. Although the current edition is commercial, there are free copies of old editions are available via search engines or course websites. There other tutorials are available as well. After crashing basics of Matlab, it was not difficult to getting right tool boxes for topic specific data analysis and scripting for particular needs.
  • When I began to use SAS (Statistical Analysis System), people in the business said get the little SAS book which gives the basis of this gigantic system, from which I was able to expend its usage for particular statistical projects.
  • Recently, I began to learn Python to use many astronomical and statistical data analysis modules developed by various scientists. Python has its tutorials where I can point for basic to fully utilize those task specific modules and my own scripting.
  • Commericial softwares often come with their own beginners’ guide and demos that a user can follow easily. By acquiring basics from these tutorials, expending applications can be well directed. On the other hands, non-commercial softwares may be lack of extensive but centralized tutorials unlike python and R. Nonetheless, acquiring tutorials for teaching is easy and these unlicensed materials are very handy whenever problems are confronted under various but task specific projects.
  • I used to have IDL tutorials on which I relied a lot to use some astronomy user libraries and CHIANTI (atomic database). I guess the resources of tutorials have changed dramatically since then.

Even if I’ve been navigating the ciao website and its threads high in volume so many times, I only come to realize now that there’s no beginner’s guide to be called as ciao cookbook, ciao tutorial, ciao primer, ciao primer, ciao for dummies, or introduction to ciao at the visible location.

This is a cultural difference. Personal thought is that this tradition prevents none Chandra scientists from using data in the Chandra archive. A good news is that there has been ciao workshops and materials from the workshops are still available. I believe compiling these materials in a fashion that other beginners’ guides introducing the data analysis system can be a good starting point for writing up a front-page worthy tutorial. The existence of this introductory material could embrace more people to use and to explore Chandra X-ray data. I hope these tutorials from other softwares and data analysis systems (primer, cookbook, introduction, tutorial, or ciao for dummies) can be good guide lines to fully compose a ciao primer.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/feed/ 1
[tutorial] multispectral imaging, a case study http://hea-www.harvard.edu/AstroStat/slog/2008/multispectral-imaging-a-case-study/ http://hea-www.harvard.edu/AstroStat/slog/2008/multispectral-imaging-a-case-study/#comments Thu, 09 Oct 2008 20:28:21 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1018 Without signal processing courses, the following equation should be awfully familiar to astronomers of photometry and handling data:
$$c_k=\int_\Lambda l(\lambda) r(\lambda) f_k(\lambda) \alpha(\lambda) d\lambda +n_k$$
Terms are in order, camera response (c_k), light source (l), spectral radiance by l (r), filter (f), sensitivity (α), and noise (n_k), where Λ indicates the range of the spectrum in which the camera is sensitive.
Or simplified to $$c_k=\int_\Lambda \phi_k (\lambda) r(\lambda) d\lambda +n_k$$
where φ denotes the combined illuminant and the spectral sensitivity of the k-th channel, which goes by augmented spectral sensitivity. Well, we can skip spectral radiance r, though. Unfortunately, the sensitivity α has multiple layers, not a simple closed function of λ in astronomical photometry.
Or $$c_k=\Theta r +n$$
Inverting Θ and finding a reconstruction operator such that r=inv(Θ)c_k leads spectral reconstruction although Θ is, in general, not a square matrix. Otherwise, approach from indirect reconstruction.

Studying that Smile (subscription needed)
A tutorial on multispectral imaging of paintings using the Mona Lisa as a case study
by Ribes, Pillay, Schmitt, and Lahanier
IEEE Sig. Proc. Mag. Jul. 2008, pp.14-26
Conclusions: In this article, we have presented a tutorial description of the multispectral acquisition of images from a signal processing point of view.

  • From the section Camera Sensitivity: “From a signal processing point of view, the filters of a multispectral camera can be conceived as sampling functions, the other elements of φ being understood as a perturbation”.
  • From the section Understanding Noise Sources :”The noise is present in the spectral, temporal, and spatial dimensions of the image signal”. … (check out the equation and the individual term explanation) … “the quantization operator represent the analog-to-digital (A/D) conversion performed before stocking the signal in digital form. This conversion introduces the so-called quantization error, a theoretically predictable noise”. (This quantization error is well understood in astronomical photometry.)
  • Understanding the sampling function φ is common for imaging and photometry but strategies and modeling (including uncertainties by error models) are different. Figures 3, 7, 8 tell a lot about usefulness and connectivity of engineers’ spectral imaging and astronomers’ calibration.
  • Hessian matrix in regression suffers similar challenges corresponding to issues related to Θ which means spectral imaging can be converted into statistical problems and likewise astronomical photometry can be put into the shape of statistical research.
  • Discussion of Noise is personally most worthwhile.

I wonder if there’s literature in astronomy matching this tutorial from which we may expand and improve current astronomical photometry processes by adopting strategies developed by more populated signal/image processing engineers and statisticians. (Considering good textbooks on statistical signal processing, and many fundamental algorithms born thanks to them, I must include statisticians. Although not discussed in this tutorial, Hidden Markov Model (HMM) is often used in signal processing but from ADS, with such keywords, no astronomical publication is aware of HMM – please, confirm my finding that HMM is not used among astronomers because my search scheme is likely imperfect.)

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/multispectral-imaging-a-case-study/feed/ 2
Signal Processing and Bootstrap http://hea-www.harvard.edu/AstroStat/slog/2008/signal-processing-and-bootstrap/ http://hea-www.harvard.edu/AstroStat/slog/2008/signal-processing-and-bootstrap/#comments Wed, 30 Jan 2008 06:33:25 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2008/signal-processing-and-bootstrap/ Astronomers have developed their ways of processing signals almost independent to but sometimes collaboratively with engineers, although the fundamental of signal processing is same: extracting information. Doubtlessly, these two parallel roads of astronomers’ and engineers’ have been pointing opposite directions: one toward the sky and the other to the earth. Nevertheless, without an intensive argument, we could say that somewhat statistics has played the medium of signal processing for both scientists and engineers. This particular issue of IEEE signal processing magazine may shed lights for astronomers interested in signal processing and statistics outside the astronomical society.

IEEE Signal Processing Magazine Jul. 2007 Vol 24 Issue 4: Bootstrap methods in signal processing

This link will show the table of contents and provide links to articles; however, the access to papers requires IEEE Xplore subscription via libraries or individual IEEE memberships). Here, I’d like to attempt to introduce some articles and tutorials.

Special topic on bootstrap:
The guest editors (A.M. Zoubir & D.R. Iskander)[1] open the issue by providing the rationale, the occasional invalid Gaussian noise assumption, and the consequential complex modeling in their editorial opening, Bootstrap Methods in Signal Processing. A practical approach has been Monte Carlo simulations but the cost of repeating experiments is problematic. The suggested alternative is the bootstrap, which provides tools for designing detectors for various signals subject to noise or interference from unknown distributions. It is said that the bootstrap is a computer-intensive tool for answering inferential questions and this issue serves as tutorials that introduce this computationally intensive statistical method to the signal processing community.

The first tutorial is written by those two guest editors: Bootstrap Methods and Applications, which begins with the list of bootstrap methods and emphasizes its resilience. It discusses the number of bootstrap samples to compensate a simulation (Monte Carlo) error to a statistical error and the sampling methods for dependent data with real examples. The flowchart from Fig. 9 provides the guideline for how to use the bootstrap methods as a summary.

The title of the second tutorial is Jackknifing Multitaper Spectrum Estimates (D.J. Thomson), which introduces the jackknife, multitaper estimates of spectra, and applying the former to the latter with real data sets. The author added the reason for his preference of jackknife to bootstrap and discussed the underline assumptions on resampling methods.

Instead of listing all articles from the special issue, a few astrostatistically notable articles are chosen:

  • Bootstrap-Inspired Techniques in Computational Intelligence (R. Polikar) explains the bootstrap for estimating errors, algorithms of bagging, boosting, and AdaBoost, and other bootstrap inspired techniques in ensemble systems with a discussion of missing.
  • Bootstrap for Empirical Multifractal Analysis (H. Wendt, P. Abry & S. Jaffard) explains block bootstrap methods for dependent data, bootstrap confidence limits, bootstrap hypothesis testing in addition to multifractal analysis. Due to the personal lack of familiarity in wavelet leaders, instead of paraphrasing, the article’s conclusion is intentionally replaced with quoting sentences:

    First, besides being mathematically well-grounded with respect to multifractal analysis, wavelet leaders exhibit significantly enhanced statistical performance compared to wavelet coefficients. … Second, bootstrap procedures provide practitioners with satisfactory confidence limits and hypothesis test p-values for multifractal parameters. Third, the computationally cheap percentile method achieves already excellent performance for both confidence limits and tests.

  • Wild Bootstrap Test (J. Franke & S. Halim) discusses the residual-based nonparametric tests and the wild bootstrap for regression models, applicable to signal/image analysis. Their test checks the differences between two irregular signals/images.
  • Nonparametric Estimates of Biological Transducer Functions (D.H.Foster & K.Zychaluk) I like the part where they discuss generalized linear model (GLM) that is useful to expend the techniques of model fitting/model estimation in astronomy beyond gaussian and least square. They also mentioned that the bootstrap is simpler for getting confidence intervals.
  • Bootstrap Particle Filtering (J.V.Candy) It is a very pleasant reading for Bayesian signal processing and particle filter. It overviews MCMC and state space model, and explains resampling as a remedy to overcome the shortcomings of importance sampling in signal processing.
  • Compressive sensing. (R.G.Baranuik)

    A lecture note presents a new method to capture and represent compressible signals at a rate significantly below the Nyquist rate. This method employs nonadaptive linear projections that preserve the structure of the signal;

I do wish this brief summary assists you selecting a few interesting articles.

  1. They wrote a book, the bootstrap and its application in signal processing.
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/signal-processing-and-bootstrap/feed/ 0
[ArXiv] 3rd week, Dec. 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-3rd-week-dec-2007/ http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-3rd-week-dec-2007/#comments Fri, 21 Dec 2007 18:40:09 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-3rd-week-dec-2007/ The paper about the Banff challenge [0712.2708] and the statistics tutorial for cosmologists [0712.3028] are the personal recommendations from this week’s [arXiv] list. Especially, I’d like to quote from Licia Verde’s [astro-ph:0712.3028],

In general, Cosmologists are Bayesians and High Energy Physicists are Frequentists.

I thought it was opposite. By the way, if you crave for more papers, click

  • [astro-ph:0712.2544]
    RHESSI Microflare Statistics II. X-ray Imaging, Spectroscopy & Energy Distributions I. G. Hannah et.al.

  • [stat.AP;0712.2708]
    The Banff Challenge: Statistical Detection of a Noisy Signal A. C. Davison & N. Sartori

  • [astro-ph:0712.2898]
    A study of supervised classification of Hipparcos variable stars using PCA and Support Vector Machines P.G. Willemsen & L. Eyer

  • [astro-ph:0712.2961]
    The frequency distribution of the height above the Galactic plane for the novae M. Burlak

  • [astro-ph:0712.3028]
    A practical guide to Basic Statistical Techniques for Data Analysis in Cosmology L. Verde

  • [astro-ph:0712.3049]
    ZOBOV: a parameter-free void-finding algorithm M. C. Neyrinck

  • [stat.CO:0712.3056]
    Gibbs Sampling for a Bayesian Hierarchical Version of the General Linear Mixed Model A. A. Johnson & G L. Jones

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-3rd-week-dec-2007/feed/ 0
Learning R http://hea-www.harvard.edu/AstroStat/slog/2007/learning-r/ http://hea-www.harvard.edu/AstroStat/slog/2007/learning-r/#comments Mon, 29 Jan 2007 15:48:07 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/learning-r/ R is a programming language and software for statistical computing and graphics. It is the most popular tool for statisticians and a widely used software for statistical data analysis thanks to the fact that its source code is freely available and it is fairly easy to access from installation to theoretical application.

Most of information about R can be found at R Project including the software itself and many add-on packages. These individually contributed packages serve particular statistical interests of their users. The documentation menu on the website and each packages contain extensive documentations of how-to’s. Some large packages include demos so that following the scripts in a demo makes R learning easy.

For astronomers, the R tutorial from Penn State Summer School for Astronomers will be useful. This tutorial illustrates R with astronomical data sets. Copy-and-pasting command lines will be a good starting point until data structures and programming logics become internalized. R is a fairly simple language to learn if one has a little experience in other programming languages.

A good online tutorial, providing an overview of R, is found from this link. Many user’s interest dependent tutorials available on line. Here are sample images from Taeyoung’s Tutorial (click for pdf).

r_tutorial-0.jpgr_tutorial-1.jpgr_tutorial-2.jpgr_tutorial-3.jpgr_tutorial-4.jpgr_tutorial-5.jpg

Among many available textbooks, the followings provide general R usage. More books are available for specific needs.

Also, Introduction to the R Project for Statistical Computing for use at ITC (click for pdf) by D.G. Rossiter provides a short but extensive overview of R.Unfortunately, FITS reader is not available in R. We hope a skillful astronomer to contribute a FITS reader among other packages.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/learning-r/feed/ 10
Learning Python http://hea-www.harvard.edu/AstroStat/slog/2007/learning-python-2/ http://hea-www.harvard.edu/AstroStat/slog/2007/learning-python-2/#comments Mon, 22 Jan 2007 09:08:36 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/learning-python-2/ http://docs.python.org/tut/ I also find other references at this site to be useful: http://docs.python.org For]]> Both in astronomy and statistics, python is recognized as a versatile programming language. I asked python tutorials to Alanna. The following is her answer, which looks very useful for those who wish to learn python.

————————————————————————
1/ Python basics:
My favorite Intro to Python website is the Tutorial by Guido van Rossum (python founder):
http://docs.python.org/tut/

I also find other references at this site to be useful: http://docs.python.org

For more complicated questions I often search: http://www.python.org/

2/ Scientific python: numarray, numpy These modules allow one to use APL/IDL- like syntax with matrices (i.e. implicit loops over indices when doing many common operations). They also have some handy scientific functions. Eventually “numarray” will be replaced by “numpy”, but it hasn’t happened yet (because of “pyfits” for fits files). It should happen this year (November??).

Numarray home page: http://www.stsci.edu/resources/software_hardware/numarray

Numpy home pages: http://sourceforge.net/projects/numpy/ for code; and http://numpy.scipy.org/ for an overview.

3/ For fits files, the astronomical programmers at Space Telescope Science Insititute alse wrote “pyfits”: http://www.stsci.edu/resources/software_hardware/pyfits

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/learning-python-2/feed/ 2