Archive for the ‘Uncertainty’ Category.

Jun 3rd, 2008| 02:53 am | Posted by vlk

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — *how does one figure what the error bounds are on the “best-fit” Lowess curve?* Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

Tags:

Brad Wargelin,

error bands,

error bars,

Fitting,

least-squares,

Loess,

Lowess,

polynomial,

question for statisticians,

smoothing Category:

Algorithms,

Fitting,

Methods,

Stat,

Uncertainty |

11 Comments
May 20th, 2008| 12:10 am | Posted by vlk

Earlier this year, Peter Edmonds showed me a press release that the Chandra folks were, at the time, considering putting out describing the possible identification of a Type Ia Supernova progenitor. What appeared to be an accreting white dwarf binary system could be discerned in 4-year old observations, coincident with the location of a supernova that went off in November 2007 (SN2007on). An amazing discovery, but there is a hitch.

And it is a statistical hitch, and involves two otherwise highly reliable and oft used methods giving contradictory answers at nearly the same significance level! Does this mean that the chances are actually 50-50? Really, we need a bona fide statistician to take a look and point out the errors of our ways.. Continue reading ‘Did they, or didn’t they?’ »

Tags:

arXiv,

Chandra,

CXC,

Optical,

Peter Edmonds,

positional coincidence,

positional error,

Power,

progenitor,

question for statisticians,

significance,

Supernova,

Type Ia,

White Dwarf,

White Dwarf binary,

X-ray Category:

arXiv,

Astro,

Data Processing,

News,

Objects,

Optical,

Stat,

Uncertainty |

5 Comments
May 19th, 2008| 10:42 am | Posted by hlee

There’s no particular opening remark this week. Only I have profound curiosity about jackknife tests in [astro-ph:0805.1994]. Including this paper, a few deserve separate discussions from a statistical point of view that shall be posted. Continue reading ‘[ArXiv] 2nd week, May 2008’ »

Tags:

bimodality,

bootstrap,

calibration uncertainty,

CF,

Classification,

CMB,

dip,

exoplanet,

Fisher matrix,

flare,

GL,

jackknife,

KS test,

marked point,

maximum likelihood,

MLE,

poisson point process,

spatial data,

XLF Category:

arXiv,

Frequentist,

Uncertainty,

X-ray |

Comment
May 11th, 2008| 10:42 pm | Posted by hlee

I think I have to review spatial statistics in astronomy, focusing on tessellation (void structure), point process (expanding 2 (3) point correlation function), and marked point process (spatial distribution of hardness ratios of X-ray distant sources, different types of galaxies -not only morphological differences but other marks such as absolute magnitudes and existence of particular features). When? Someday…

In addition to Bayesian methodologies, like this week’s astro-ph, studies on characterizing empirical spatial distributions of voids and galaxies frequently appear, which I believe can be enriched further with the ideas from stochastic geometry and spatial statistics. Click for what was appeared in arXiv this week. Continue reading ‘[ArXiv] 1st week, May 2008’ »

Tags:

Classification,

covariance,

FARIMA,

Fisher information,

GL,

GRB,

Levy,

light curve,

limb darkening,

ML,

Pareto distribution,

quasars,

solar flare,

standard candle,

tessellation,

time series,

VO,

void Category:

arXiv,

MCMC,

Uncertainty |

1 Comment
May 1st, 2008| 02:00 pm | Posted by vlk

*Why is it that detection of emission lines is more reliable than that of absorption lines?*

That was one of the questions that came up during the recent AstroStat Special Session at HEAD2008. When you look at the iconic Figure 1 from Protassov et al (2002), which shows how the null distribution of the Likelihood Ratio Test (LRT) and how it holds up for testing the existence of emission and absorption lines. The thin vertical lines are the nominal F-test cutoffs for a 5% false positive rate. The nominal F-test is too conservative in the former case (figures a and b; i.e., actual existing lines will not be recognized as such), and is too anti-conservative in the latter case (figure c; i.e., non-existent lines will be flagged as real). Continue reading ‘The Flip Test’ »

Apr 17th, 2008| 12:17 pm | Posted by chasc

*Astronomers write literally thousands of proposals each year to observe their favorite targets with their favorite telescopes. Every proposal must be accompanied by a technical justification, where the proposers demonstrate that their goal is achievable, usually via a simulation. Surprisingly, a large number of these justifications are statistically unsound. Guest Slogger ***Simon Vaughan** describes the problem and shows what you can do to make reviewers happy (and you definitely want to keep reviewers happy).

Continue reading ‘The Burden of Reviewers’ »

Apr 8th, 2008| 07:49 pm | Posted by hlee

The breakdown point of the mean is asymptotically zero whereas the breakdown point of the median is 1/2. The breakdown point is a measure of the robustness of the estimator and its value reaches up to 1/2. In the presence of outliers, the mean cannot be a good measure of the central location of the data distribution whereas the median is likely to locate the center. Common plug-in estimators like mean and root mean square error may not provide best fits and uncertainties because of this zero breakdown point of the mean. The efficiency of the mean estimator does not guarantee its unbiasedness; therefore, a bit of care is needed prior to plugging in the data into these estimators to get the best fit and uncertainty. There was a preprint from [arXiv] about the use of median last week. Continue reading ‘[ArXiv] use of the median’ »

Mar 30th, 2008| 11:16 pm | Posted by hlee

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

__The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found__: cited from Continue reading ‘Statistics is the study of uncertainty’ »

Mar 12th, 2008| 03:32 pm | Posted by hlee

Astrometry.net, a cool website I heard from Harvard Astronomy Professor Doug Finkbeiner’s class (Principles of Astronomical Measurements), does a complex job of matching your images of unknown locations or coordinates to sources in catalogs. By providing your images in various formats, they provide astrometric calibration meta-data and lists of known objects falling inside the field of view. Continue reading ‘Astrometry.net’ »

Jan 30th, 2008| 02:33 am | Posted by hlee

Astronomers have developed their ways of processing signals almost independent to but sometimes collaboratively with engineers, although the fundamental of signal processing is same: extracting information. Doubtlessly, these two parallel roads of astronomers’ and engineers’ have been pointing opposite directions: one toward the sky and the other to the earth. Nevertheless, without an intensive argument, we could say that somewhat statistics has played the medium of signal processing for both scientists and engineers. This particular issue of IEEE signal processing magazine may shed lights for astronomers interested in signal processing and statistics outside the astronomical society.

IEEE Signal Processing Magazine Jul. 2007 Vol 24 Issue 4: Bootstrap methods in signal processing

This link will show the table of contents and provide links to articles; however, the access to papers requires IEEE Xplore subscription via libraries or individual IEEE memberships). Here, I’d like to attempt to introduce some articles and tutorials.

Continue reading ‘Signal Processing and Bootstrap’ »

Tags:

bootstrap,

compressive sensing,

confidence interval,

GLM,

IEEE,

jacknife,

machine learning,

multitaper estimate,

particle filter,

signal processing,

statistical inference,

Tutorial,

wavelet Category:

Algorithms,

arXiv,

Bayesian,

Cross-Cultural,

Fitting,

Frequentist,

MC,

MCMC,

Methods,

Misc,

Spectral,

Stat,

Uncertainty |

Comment
Jan 21st, 2008| 03:33 pm | Posted by vlk

One of the big problems that has come up in recent years is in how to represent the uncertainty in certain estimates. Astronomers usually present errors as *+-stddev* on the quantities of interest, but that presupposes that the errors are uncorrelated. But suppose you are estimating a multi-dimensional set of parameters that may have large correlations amongst themselves? One such case is that of Differential Emission Measures (DEM), where the “quantity of emission” from a plasma (loosely, how much stuff there is available to emit — it is the product of the volume and the densities of electrons and H) is estimated for different temperatures. See the plots at the PoA DEM tutorial for examples of how we are currently trying to visualize the error bars. Another example is the correlated systematic uncertainties in effective areas (Drake et al., 2005, Chandra Cal Workshop). This is not dissimilar to the problem of determining the significance of a “feature” in an image (Connors, A. & van Dyk, D.A., 2007, SCMA IV). Continue reading ‘Dance of the Errors’ »

Tags:

animated,

David Garcia-Alvarez,

DEM,

error bands,

error bars,

flux,

MCMC,

O VII,

O VIII,

PINTofALE,

question for statisticians Category:

Algorithms,

Astro,

Data Processing,

Jargon,

MCMC,

Spectral,

Stars,

Uncertainty |

2 Comments
Oct 10th, 2007| 12:26 pm | Posted by vlk

Hyunsook drew attention to this paper (arXiv:0709.4531v1) by Brad Schaefer on the underdispersed measurements of the distances to LMC. He makes a compelling case that since 2002 published numbers in the literature have been hewing to an “acceptable number”, possibly in an unconscious effort to pass muster with their referees. Essentially, the distribution of the best-fit distances are much more closely clustered than you would expect from the quoted sizes of the error bars. Continue reading ‘“*you* are biased, *I* have an informative prior”’ »

Oct 3rd, 2007| 04:08 pm | Posted by aconnors

This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.

My thoughts:

* Model *, for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a *model of the source, or physics, or sky*. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the *full statistical model*.

For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of * physics* model mismatch. Of course what he (and others) meant was *statistical* model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.

This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:

When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.

One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.

Sep 19th, 2007| 02:21 pm | Posted by vlk

[arXiv:0709.2358] Cleaning the USNO-B Catalog through automatic detection of optical artifacts, by Barron et al.

Statistically speaking, “false sources” are generally in the domain of ~~Type II~~ **Type I** errors, defined by the probability of detecting a signal where there is none. But what if there is a clear signal, but it is not real? Continue reading ‘Spurious Sources’ »

Tags:

arXiv,

catalog,

diffraction spikes,

false sources,

instrumental features,

Stars,

USNO Category:

arXiv,

Astro,

Data Processing,

Imaging,

Optical,

Stars,

Uncertainty |

2 Comments
Sep 11th, 2007| 01:12 am | Posted by hlee

From arxiv/astro-ph:0708.1208v1:

** The measurement errors in the Swift-UVOT and XMM-OM** by N.P.M. Kuin and S.R. Rosen

The probability distribution of photon counts from the Optical Monitor on XMM Newton satellite (XMM-OM) and the UVOT on the Swift satellite follows a binomial distribution due to detector characteristics. Incident count rate was derived as a function of the measured count rate, which was shown to follow a binomial distribution.

Continue reading ‘[ArXiv] Swift and XMM measurement errors, Sep. 8, 2007’ »

Tags:

binomial,

measurement error,

photon count,

Swift,

UVOT,

XMM Category:

arXiv,

Astro,

Data Processing,

Fitting,

Methods,

Uncertainty |

1 Comment