Archive for the ‘Cross-Cultural’ Category.

An excerpt from …

I’ve been complaining about how one can do machine learning on solar images without a training set? (see my comment at the big picture). On the other hand, I’m also aware of challenges in astronomy that data (images) cannot be transformed freely and be fed into standard machine learning algorithms. Tailoring data pipelining, cleaning, and processing to currently existing vision algorithms may not be achievable. The hope of automatizing the detection/identification procedure of interesting features (e.g. flares and loops) and forecasting events on the surface of the Sun is only a dream. Even though the level of image data stream is that of tsunami, we might have to depend on human eyes to comb out interesting features on the Sun until the new paradigm of automatized feature identification algorithms based on a single image i.e. without a training set. The good news is that human eyes have done a superb job! Continue reading ‘An excerpt from …’ »

[ArXiv] Particle Physics

[stat.AP:0811.1663]
Open Statistical Issues in Particle Physics by Louis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription. Continue reading ‘[ArXiv] Particle Physics’ »

A book by David Freedman

A continuation from my posting, titled circumspect frequentist.

Title: Statistical Models: Theory and Practice (click for the publisher’s website)
My one line review, rather a comment several months ago was

Bias in asymptotic standard errors is not a familiar topic for astronomers

and I don’t understand why I wrote it but I think I came up this comment owing to my pursuit of modeling measurement errors occurring in astronomical researches. Continue reading ‘A book by David Freedman’ »

[MADS] Semiparametric

There were (only) four articles from ADS whose abstracts contain the word semiparametric (none in titles). Therefore, semiparametric is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term semiparametric to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »

[ArXiv] Special Issue from Annals of Applied Statistics

When I was studying astronomy, during when I once became a subject for a social science survey study about life in a department where gender bias is extreme (I was only female), people often asked me how to forecast weather or how to predict future (boys often get questions related to becoming astronauts in addition to weather men and astrologists). Relating astronomy to earth science still happens. Statisticians that I met at conferences, often tried to associate my efforts on astronomical data with those of geologists and meteorologists, who often use stochastic models and spatial temporal models, dimensional extensions of models in time series. Because of this confusion between astronomy and meteorology/geology/oceanology, and the longer history of wide statistical applications found from the latter subjects (a good counter example is the least square method by Gauss but I cannot think more examples to contradict my statement that statistics is used widely among earth scientists with rich history), from time to time my attention has been paid to various applications and models in those subjects so as to find a thread for similar applications for astronomy. Although I don’t like the misconception of astronomy equal to meteorology or geoscience, those scientific fields, what so ever, share at least one commonality that statistical methods are applied to analyzing satellite data. Continue reading ‘[ArXiv] Special Issue from Annals of Applied Statistics’ »

accessing data, easier than before but…

Someone emailed me for globular cluster data sets I used in a proceeding paper, which was about how to determine the multi-modality (multiple populations) based on well known and new information criteria without binning the luminosity functions. I spent quite time to understand the data sets with suspicious numbers of globular cluster populations. On the other hand, obtaining globular cluster data sets was easy because of available data archives such as VizieR. Most data sets in charts/tables, I acquire those data from VizieR. In order to understand science behind those data sets, I check ADS. Well, actually it happens the other way around: check scientific background first to assess whether there is room for statistics, then search for available data sets. Continue reading ‘accessing data, easier than before but…’ »

Likelihood Ratio Technique

I wonder what Fisher, Neyman, and Pearson would say if they see “Technique” after “Likelihood Ratio” instead of “Test.” A presenter’s saying “Likelihood Ratio Technique” for source identification, I couldn’t resist checking it out not to offend founding fathers of the likelihood principle in statistics since “Technique” sounded derogatory to be attached with “Likelihood” to my ears. I thank, above all, the speaker who kindly gave me the reference about this likelihood ratio technique. Continue reading ‘Likelihood Ratio Technique’ »

Lost in Translation: Measurement Error

You would think that something like “measurement error” is a well-defined concept, and everyone knows what it means. Not so. I have so far counted at least 3 different interpretations of what it means.

Suppose you have measurements X={Xi, i=1..N} of a quantity whose true value is, say, X0. One can then compute the mean and standard deviation of the measurements, E(X) and σX. One can also infer the value of a parameter θ(X), derive the posterior probability density p(θ|X), and obtain confidence intervals on it.

So here are the different interpretations:

  1. Measurement error is σX, or the spread in the measurements. Astronomers tend to use the term in this manner.
  2. Measurement error is X0-E(X), or the “error made when you make the measurement”, essentially what is left over beyond mere statistical variations. This is how statisticians seem to use it, essentially the bias term. To quote David van Dyk

    For us it is just English. If your measurement is different from the real value. So this is not the Poisson variability of the source for effects or ARF, RMF, etc. It would disappear if you had a perfect measuring device (e.g., telescope).

  3. Measurement error is the width of p(θ|X), i.e., the measurement error of the first type propagated through the analysis. Astronomers use this too to refer to measurement error.

Who am I to say which is right? But be aware of who you may be speaking with and be sure to clarify what you mean when you use the term!

[MADS] multiscale modeling

A few scientists in our group work on estimating the intensities of gamma ray observations from sky surveys. This work distinguishes from typical image processing which mostly concerns the point estimation of intensity at each pixel location and the size of overall white noise type error. Often times you will notice from image processing that the orthogonality between errors and sources, and the white noise assumptions. These assumptions are typical features in image processing utilities and modules. On the other hand, CHASC scientists relate more general and broad statistical inference problems in estimating the intensity map, like intensity uncertainties at each point and the scientifically informative display of the intensity map with uncertainty according to the Poisson count model and constraints from physics and the instrument, where the field, multiscale modeling is associated. Continue reading ‘[MADS] multiscale modeling’ »

[MADS] HMM

MADS stands for “Missing in ADS.” Every astronomer, I believe, knows what ADS is. As we have [EotW] series and used to have [ArXiv] series, creating a new series for semi-periodic postings under the well known name ADS seems interesting. Continue reading ‘[MADS] HMM’ »

Borel Cantelli Lemma for the Gaussian World

Almost two year long scrutinizing some publications by astronomers gave me enough impression that astronomers live in the Gaussian world. You are likely to object this statement by saying that astronomers know and use Poisson, binomial, Pareto (power laws), Weibull, exponential, Laplace (Cauchy), Gamma, and some other distributions.[1] This is true. I witness that these distributions are referred in many publications; however, when it comes to obtaining “BEST FIT estimates for the parameters of interest” and “their ERROR (BARS)”, suddenly everything goes back to the Gaussian world.[2]

Borel Cantelli Lemma (from Planet Math): because of mathematical symbols, a link was made but any probability books have the lemma with proofs and descriptions.

Continue reading ‘Borel Cantelli Lemma for the Gaussian World’ »

  1. It is a bit disappointing fact that not many mention the t distribution, even though less than 30 observations are available.[]
  2. To stay off this Gaussian world, some astronomers rely on Bayesian statistics and explicitly say that it is the only escape, which is sometimes true and sometimes not – I personally weigh more that Bayesians are not always more robust than frequentist methods as opposed to astronomers’ discussion about robust methods.[]

[SPS] Testing Completeness

There will be a special session at the 213th AAS meeting on meaning from surveys and population studies (SPS). Until then, it might be useful to pull out some interesting and relevant papers and questions/challenges as a preliminary to the meeting. I will not list astronomical catalogs and surveys only, which are literally countless these days but will bring out some if they change the way how science is performed with a description of the catalog (the best example would be SDSS, Sloan Digital Sky Survey, to my knowledge). Continue reading ‘[SPS] Testing Completeness’ »

It bothers me.

The full description is given http://cxc.harvard.edu/ciao3.4/ahelp/bayes.html about “bayes” under sherpa/ciao[1]. Some sentences kept bothering me and here’s my account for the reason given outside of quotes. Continue reading ‘It bothers me.’ »

  1. Note that the current sherpa is beta under ciao 4.0 not under ciao 3.4 and a description about “bayes” from the most recent sherpa is not available yet, which means this post needs updates one new release is available[]

Astroart Survey

Astronomy is known for its pretty pictures, but as Joe the Astronomer would say, those pretty pictures don’t make themselves. A lot of thought goes into maximizing scientific content while conveying just the right information, all discernible at a single glance. So the hardworkin folks at Chandra want your help in figuring out what works and how well, and they have set up a survey at http://astroart.cfa.harvard.edu/. Take the survey, it is both interesting and challenging!

read.table()

The first step of data analysis or applications is reading the data sets into a tool of choice. Recent years, I’ve been using R (see also Learning R) for that regard but I’ve enjoyed freedoms for the same purpose from these languages and tools: BASIC, fortran77/90/95, C/C++, IDL, IRAF, AIPS, mongo/supermongo, MATLAB, Maple, Mathematica, SAS, SPSS, Gauss, ARC, Minitab, and recently Python and ciao which I just began to learn. Many of them I lost the fluency of how to use it. Quick learning tends to be flash memory. Some will need brain defragmentation and recovering time for extensive scientific work. A few I don’t like to use at all. No matter what, I’m not a computer geek. I’m not good at new gadgets, new softwares, nor welcome new and allegedly versatile computing systems. But one must be if he/she want to handle data. Until recently I believed R has such versatility in the aspect of reading in data. Yet, there is nothing without exceptions. Continue reading ‘read.table()’ »