Oct 28th, 2009| 09:29 am | Posted by hlee

As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various **nonparametric** methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by **computational geometry**). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. Continue reading ‘[ArXiv] Voronoi Tessellations’ »

Tags:

data compression,

delanay tessellation,

density estimation,

image processing,

nonparametric,

spatial statistics,

van de Weygaert,

van Lieshout,

voronoi tessellation Category:

Algorithms,

arXiv,

Galaxies,

Methods |

Comment
Oct 15th, 2009| 06:46 pm | Posted by hlee

Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists^{[1]}. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »

Tags:

ANCOVA,

ANOVA,

approximation,

correlation,

Gaussianity,

graphics,

MADS,

modeling,

nonparametric,

parallel coordinates,

PCA,

quality,

quantity,

regression,

scatter plots Category:

arXiv,

Cross-Cultural,

Fitting,

Jargon,

Methods,

Stat,

Uncertainty |

Comment
Oct 6th, 2009| 01:49 pm | Posted by hlee

When it comes to applying statistics for measuring goodness-of-fit, the Pearson χ^{2} test is the dominant player in a race and the Kolmogorov-Smirnoff test statistic trails far behind. Although it seems almost invisible in this race, there are more various non-parametric statistics for testing goodness-of-fit and for comparing the sampling distribution to a reference distribution as legitimate race participants trained by many statisticians. Listing their names probably useful to some astronomers when they find the underlying assumptions for the χ^{2} test do not match the data. Perhaps, some astronomers want to try other nonparametric test statistics other than the K-S test. I’ve seen other test statistics in astronomical journals from time to time. Depending on data and statistical properties, one test statistic could work better than the other; therefore, it’s worthwhile to keep the variety in one’s mind that there are other tests beyond the χ^{2} test goodness-of-fit test statistic. Continue reading ‘Goodness-of-fit tests’ »

Jun 1st, 2009| 09:51 pm | Posted by hlee

How would you assign orders to multivariate data? If you have your strategy to achieve this ordering task, I’d like to ask, “is your strategy **affine invariant**?” meaning that shift and rotation invariant. Continue reading ‘[MADS] data depth’ »

Tags:

break points,

data depth,

MADS,

mean,

median,

multivariate,

nonparametric,

order,

parasite,

quantile,

robust,

sort,

vertebrate Category:

Algorithms,

arXiv,

Cross-Cultural,

Jargon,

Stat |

Comment
May 18th, 2009| 12:18 pm | Posted by hlee

My understandings of **“robustness”** from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss? Continue reading ‘Robust Statistics’ »

Tags:

break point,

Huber,

nonparametric,

robust,

Rousseeuw,

Tukey Category:

Bayesian,

Frequentist,

Jargon,

MCMC,

Methods,

Quotes,

Stat,

Uncertainty |

Comment
Feb 9th, 2009| 03:16 pm | Posted by hlee

There were (only) four articles from ADS whose abstracts contain the word **semiparametric** (none in titles). Therefore, **semiparametric** is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term **semiparametric** to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »

Oct 27th, 2008| 09:24 am | Posted by hlee

The notions of **missing data** are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I’m not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about **imputation** in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about **incompleteness** within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes. Continue reading ‘missing data’ »

Tags:

bootstrap,

catalog,

Efron,

estimator,

ignorable,

imputation,

incompleteness,

Little,

MAR,

MCAR,

missing data,

nonparametric,

Rubin,

Schafer,

survey Category:

Astro,

Cross-Cultural,

Data Processing,

Stat |

2 Comments
Nov 9th, 2007| 12:45 pm | Posted by hlee

There should be at least one paper that drags your attention. Various statistics topics appeared in astro-ph this week.

Continue reading ‘[ArXiv] 2nd week, Nov. 2007’ »

Oct 30th, 2007| 03:37 am | Posted by hlee

From arxiv/astro-ph:0705.4199v1

**In search of an unbiased temperature estimator for statistically poor X-ray spectra**

A. Leccardi and S. Molendi

There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay’s posting), I’d like to give a shot. I was very excited because I haven’t seen any astronomical papers discussing **unbiased estimators** solely.

Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »

Tags:

chi-square,

maximum likelihood,

mixing distribution,

mixture,

nonparametric,

robust,

subsampling,

transformation,

unbiased,

Uncertainty Category:

arXiv,

Frequentist,

Stat |

Comment
Aug 14th, 2007| 10:17 pm | Posted by hlee

During the International X-ray Summer School, as a project presentation, I tried to explain the inadequate practice of χ^2 statistics in astronomy. *If your best fit is biased (any misidentification of a model easily causes such bias), do not use χ^2 statistics to get 1σ error for the 68% chance of capturing the true parameter.*

Later, I decided to do further investigation on that subject and this paper came along: Astrostatistics: Goodness-of-Fit and All That! by Babu and Feigelson.

Continue reading ‘Astrostatistics: Goodness-of-Fit and All That!’ »

Tags:

Anderson-Darling,

Babu,

best-fit,

bias,

bootstrap,

chi-square,

Cramer-von Mises,

Feigelson,

Kolmogorov-Smirnoff,

Kullback-Leibler distance,

nonparametric,

parametric,

resampling Category:

Algorithms,

arXiv,

Astro,

Fitting,

High-Energy,

Methods,

Spectral,

Stat |

7 Comments
Jul 25th, 2007| 01:46 pm | Posted by hlee

From arxiv/astro-ph:0707.3413

**The Sixth Data Release of the Sloan Digital Sky Survey** by … many people …

The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at http://www.sdss.org/dr6. Additionally, Catalog Archive Service (CAS) and

SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.

Continue reading ‘[ArXiv] SDSS DR6, July 23, 2007’ »

Tags:

catalog,

convex hull peeling,

density estimation,

DR6,

massive data,

multivariate analysis,

nonparametric,

SDSS,

SQL,

voronoi tessellation Category:

Algorithms,

arXiv,

Astro,

Data Processing,

Misc,

Optical |

1 Comment