Archive for the ‘Data Processing’ Category.

Spurious Sources

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, “false sources” are generally in the domain of Type II Type I errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real? Continue reading ‘Spurious Sources’ »

VOConvert (ConVOT)

VOConvert or ConVOT is a small java script which does file format conversion from fits to ascii or the other way around. These tools might be useful for statisticians who want to convert astronomers’ data format called fits into ascii quickly for a statistical analysis. Additionally, VOConvert creates an interim output for VOStat, designed for statistical data analysis from Virtual Observatory. The softwares and the list of Virtual Observatories around the world can be found at Virtual Observatory India. Please, check a link in VOstat (http://hea-www.harvard.edu/AstroStat/slog/2007/vostat) for more information about VOstat.

[ArXiv] Swift and XMM measurement errors, Sep. 8, 2007

From arxiv/astro-ph:0708.1208v1:
The measurement errors in the Swift-UVOT and XMM-OM by N.P.M. Kuin and S.R. Rosen

The probability distribution of photon counts from the Optical Monitor on XMM Newton satellite (XMM-OM) and the UVOT on the Swift satellite follows a binomial distribution due to detector characteristics. Incident count rate was derived as a function of the measured count rate, which was shown to follow a binomial distribution.
Continue reading ‘[ArXiv] Swift and XMM measurement errors, Sep. 8, 2007’ »

[ArXiv] Google Sky, Sept. 05, 2007

Ah..Sky in Google Earth made an arxiv appearance [arxiv/astro-ph:0709.0752], Sky in Google Earth: The Next Frontier in Astronomical Data Discovery and Visualization by R. Scranton et al.

[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007

From arxiv/astro-ph:0708.4030v1
Deep ACS Imaging in the Globular Cluster NGC 6397: The Cluster Color Magnitude Diagram and Luminosity Function by H.B. Richer et.al

This paper presented an observational study of a globular cluster, named NGC 6397, enhanced and more informative compared to previous observations in a sense that 1) a truncation in the white dwarf cooling sequence occurs at 28 magnitude, 2) the cluster main sequence seems to terminate approximately at the hydrogen-burning limit predicted by two independent stellar evolution models, and 3) luminosity functions (LFs) or mass functions (MFs) are well defined. Nothing statistical, but the idea of defining color magnitude diagrams (CMDs) and LFs described in the paper, will assist developing suitable statistics on CMD and LF fitting problems in addition to the improved measurements (ACS imaging) of stars in NGC 6397.
Continue reading ‘[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007’ »

[ArXiv] Decision Tree, Aug. 31, 2007

From arxiv/astro-ph:0708.4274v1
Comparison of decision tree methods for finding active objects by Y. Zhao and Y. Zhang

The authors (astronomers) introduced and summarized various decision three methods (REPTree, Random Tree, Decision Stump, Random Forest, J48, NBTree, and AdTree) to the astronomical community.
Continue reading ‘[ArXiv] Decision Tree, Aug. 31, 2007’ »

Quote of the Week, Aug 31, 2007

Once again, the middle of a recent (Aug 30-31, 2007) argument within CHASC, on why physicists and astronomers view “3 sigma” results with suspicion and expect (roughly) > 5 sigma; while statisticians and biologists typically assume 95% is OK:

David van Dyk (representing statistics culture):

Can’t you look at it again? Collect more data?

Vinay Kashyap (representing astronomy and physics culture):

…I can confidently answer this question: no, alas, we usually cannot look at it again!!

Ah. Hmm. To rephrase [the question]: if you have a “7.5 sigma” feature, with a day-long [imaging Markov Chain Monte Carlo] run you can only show that it is “>3sigma”, but is it possible, even with that day-long run, to tell that the feature is really at 7.5sigma — is that the question? Well that would be nice, but I don’t understand how observing again will help?

David van Dyk :

No one believes any realistic test is properly calibrated that far into the tail. Using 5-sigma is really just a high bar, but the precise calibration will never be done. (This is a reason not to sweet the computation TOO much.)

Most other scientific areas set the bar lower (2 or 3 sigma) BUT don’t really believe the results unless they are replicated.

My assertion is that I find replicated results more convincing than extreme p-values. And the controversial part: Astronomers should aim for replication rather than worry about 5-sigma.

Quote of the Week, Aug 23, 2007

These are from two lively CHASC discussions on classification, or cluster analysis. The first was on Feb 7, 2006; the continuation on Dec 12, 2006, at the Harvard Statistics Department, as part of Stat 310 .

David van Dyk:

Don’t demand too much of the classes. You’re not going to say that all events can be well-classified…. It’s more descriptive. It gives you places to look. Then you look at your classes.

Xiao Li Meng:

Then you’re saying the cluster analysis is more like -

David van Dyk:

It’s really like you have a propsal for classes. You then investigate the physical processes more thoroughly. You may have classes that divide it [up]

……

David van Dyk:

But it can make a difference, where you see the clusters, depending on your [parameter] transformation.You can squish the white spaces, and stretch out the crowded spaces; so it can change where you think the clusters are.

Aneta Siemignowska:

But that is interesting.

Andreas Zezas:

Yes, that is very interesting.

These are particularly in honor of Hyunsook Lee‘s recent posting of Chattopadhyay et. al.’s new work about possible intrinsic classes of gamma-ray bursts. Are they really physical classes — or do they only appear to be distinct clusters because we view them through the “squished” lens (parameter spaces) of our imperfect instruments?

[ArXiv] GRB host galaxies, Aug. 10, 2007

From arxiv/astro-ph:0708.1510v1
Connecting GRBs and galaxies: the probability of chance coincidence by Cobb and Bailyn

Without an optical afterglow, a galaxy within the 2 arc second error region of a GRB x-ray afterglow is identified as a host galaxy; however confusion can rise due to the facts that 1. the edge of a galaxy is diffused, 2. multiple sources could exist within 2 arc second error region, 3.the distance between the galaxy and the x-ray afterglow is measured by projection, and 4. lensing causes increase of brightness and position shifts. In this paper, the authors “investigated the fields of 72 GRBs in order to examine the general issue of associations between GRBs and host galaxies.”
Continue reading ‘[ArXiv] GRB host galaxies, Aug. 10, 2007’ »

Change Point Problem

X-ray summer school is on going. Numerous interesting topics were presented but not much about statistics (Only advice so far, “use implemented statistics in x-ray data reduction/analysis tools” and “it’s just a tool”). Nevertheless, I happened to talk two students extensively on their research topics, finding features from light curves. One was very empirical from comparing gamma ray burst trigger time to 24kHz observations and the other was statistical and algorithmic by using Bayesian Block. Sadly, I could not give them answers but the latter one dragged my attention.
Continue reading ‘Change Point Problem’ »

[ArXiv] SDSS DR6, July 23, 2007

From arxiv/astro-ph:0707.3413
The Sixth Data Release of the Sloan Digital Sky Survey by … many people …

The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at http://www.sdss.org/dr6. Additionally, Catalog Archive Service (CAS) and
SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.
Continue reading ‘[ArXiv] SDSS DR6, July 23, 2007’ »

Photometric Redshifts

Since I began to subscribe arxiv/astro-ph abstracts, from an astrostatistical point of view, one of the most frequent topics has been photometric redshifts. This photometric redshift has been a popular topic as the catalog of remote photometric object observation multiplies its volume and sky survey projects in multiple bands lead to virtual observatories (VO – will discuss in the later posting). Just searching by photometric redshifts in google scholar and arxiv.org provides more than 2000 articles since 2000.
Continue reading ‘Photometric Redshifts’ »

[ArXiv] Data Visualization, July 17, 2007

From arxiv/astro-ph:0707.2474,
Visualization, Exploration and Data Analysis of Complex Astrophysical Data by Comparato, Becciani, Costa, Larsson, Garilli, Gheller, and Taylor

This paper introduces a novel advanced visualization tool VisIVO,[1] its advantages from combining a protocol called PLASTIC (Platform for Astronomy Tool Interconnection) for displaying and extracting information from astrophysical data, its enhanced connection to VO (Virtual Observatory), and its usage in several scientific cases. Continue reading ‘[ArXiv] Data Visualization, July 17, 2007’ »

  1. Available at http://visivo.cineca.it[]

[ArXiv] Complete Catalog of GRBs from BeppoSAX, July 13, 2007

From arxiv/astro-ph:0707.1900v1
The complete catalogue of gamma-ray bursts observed by the Wide Field Cameras on board BeppoSAX by Vetere, et.al.

This paper intend to publicize the largest data set of Gamma Ray Burst (GRB) X-ray afterglows (right curves after the event), which is available from http://www.asdc.asi.it. It is claimed to be a complete on-line catalog of GRB observed by two wide-Field Cameras on board BeppoSAX (Click for its Wiki) in the period of 1996-2002. It is comprised with 77 bursts and 56 GRBs with Xray light curves, covering the energy range 40-700keV. A brief introduction to the instrument, data reduction, and catalog description is given.

[ArXiv] Matching Sources, July 11, 2007

From arxiv/astro-ph: 0707.1611 Probabilistic Cross-Identification of Astronomical Sources by Budavari and Szalay

As multi-wave length studies become more popular, various source matching methodologies have been discussed. One of such methods particularly focused on Bayesian idea was introduced by Budavari and Szalay with a demand for symmetric algorithms in a unified framework.
Continue reading ‘[ArXiv] Matching Sources, July 11, 2007’ »