Oct 28th, 2009| 09:29 am | Posted by hlee
As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various nonparametric methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by computational geometry). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. Continue reading ‘[ArXiv] Voronoi Tessellations’ »
Tags:
data compression,
delanay tessellation,
density estimation,
image processing,
nonparametric,
spatial statistics,
van de Weygaert,
van Lieshout,
voronoi tessellation Category:
Algorithms,
arXiv,
Galaxies,
Methods |
Comment
Oct 15th, 2009| 06:46 pm | Posted by hlee
Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists[]. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »
Tags:
ANCOVA,
ANOVA,
approximation,
correlation,
Gaussianity,
graphics,
MADS,
modeling,
nonparametric,
parallel coordinates,
PCA,
quality,
quantity,
regression,
scatter plots Category:
arXiv,
Cross-Cultural,
Fitting,
Jargon,
Methods,
Stat,
Uncertainty |
Comment
Oct 6th, 2009| 01:49 pm | Posted by hlee
When it comes to applying statistics for measuring goodness-of-fit, the Pearson χ2 test is the dominant player in a race and the Kolmogorov-Smirnoff test statistic trails far behind. Although it seems almost invisible in this race, there are more various non-parametric statistics for testing goodness-of-fit and for comparing the sampling distribution to a reference distribution as legitimate race participants trained by many statisticians. Listing their names probably useful to some astronomers when they find the underlying assumptions for the χ2 test do not match the data. Perhaps, some astronomers want to try other nonparametric test statistics other than the K-S test. I’ve seen other test statistics in astronomical journals from time to time. Depending on data and statistical properties, one test statistic could work better than the other; therefore, it’s worthwhile to keep the variety in one’s mind that there are other tests beyond the χ2 test goodness-of-fit test statistic. Continue reading ‘Goodness-of-fit tests’ »
Jun 1st, 2009| 09:51 pm | Posted by hlee
How would you assign orders to multivariate data? If you have your strategy to achieve this ordering task, I’d like to ask, “is your strategy affine invariant?” meaning that shift and rotation invariant. Continue reading ‘[MADS] data depth’ »
Tags:
break points,
data depth,
MADS,
mean,
median,
multivariate,
nonparametric,
order,
parasite,
quantile,
robust,
sort,
vertebrate Category:
Algorithms,
arXiv,
Cross-Cultural,
Jargon,
Stat |
Comment
May 18th, 2009| 12:18 pm | Posted by hlee
My understandings of “robustness” from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss? Continue reading ‘Robust Statistics’ »
Tags:
break point,
Huber,
nonparametric,
robust,
Rousseeuw,
Tukey Category:
Bayesian,
Frequentist,
Jargon,
MCMC,
Methods,
Quotes,
Stat,
Uncertainty |
Comment
Feb 9th, 2009| 03:16 pm | Posted by hlee
There were (only) four articles from ADS whose abstracts contain the word semiparametric (none in titles). Therefore, semiparametric is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term semiparametric to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »
Oct 27th, 2008| 09:24 am | Posted by hlee
The notions of missing data are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I’m not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about imputation in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about incompleteness within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes. Continue reading ‘missing data’ »
Tags:
bootstrap,
catalog,
Efron,
estimator,
ignorable,
imputation,
incompleteness,
Little,
MAR,
MCAR,
missing data,
nonparametric,
Rubin,
Schafer,
survey Category:
Astro,
Cross-Cultural,
Data Processing,
Stat |
2 Comments
Nov 9th, 2007| 12:45 pm | Posted by hlee
There should be at least one paper that drags your attention. Various statistics topics appeared in astro-ph this week.
Continue reading ‘[ArXiv] 2nd week, Nov. 2007’ »
Oct 30th, 2007| 03:37 am | Posted by hlee
From arxiv/astro-ph:0705.4199v1
In search of an unbiased temperature estimator for statistically poor X-ray spectra
A. Leccardi and S. Molendi
There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay’s posting), I’d like to give a shot. I was very excited because I haven’t seen any astronomical papers discussing unbiased estimators solely.
Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »
Tags:
chi-square,
maximum likelihood,
mixing distribution,
mixture,
nonparametric,
robust,
subsampling,
transformation,
unbiased,
Uncertainty Category:
arXiv,
Frequentist,
Stat |
Comment
Aug 14th, 2007| 10:17 pm | Posted by hlee
During the International X-ray Summer School, as a project presentation, I tried to explain the inadequate practice of χ^2 statistics in astronomy. If your best fit is biased (any misidentification of a model easily causes such bias), do not use χ^2 statistics to get 1σ error for the 68% chance of capturing the true parameter.
Later, I decided to do further investigation on that subject and this paper came along: Astrostatistics: Goodness-of-Fit and All That! by Babu and Feigelson.
Continue reading ‘Astrostatistics: Goodness-of-Fit and All That!’ »
Tags:
Anderson-Darling,
Babu,
best-fit,
bias,
bootstrap,
chi-square,
Cramer-von Mises,
Feigelson,
Kolmogorov-Smirnoff,
Kullback-Leibler distance,
nonparametric,
parametric,
resampling Category:
Algorithms,
arXiv,
Astro,
Fitting,
High-Energy,
Methods,
Spectral,
Stat |
7 Comments
Jul 25th, 2007| 01:46 pm | Posted by hlee
From arxiv/astro-ph:0707.3413
The Sixth Data Release of the Sloan Digital Sky Survey by … many people …
The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at http://www.sdss.org/dr6. Additionally, Catalog Archive Service (CAS) and
SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.
Continue reading ‘[ArXiv] SDSS DR6, July 23, 2007’ »
Tags:
catalog,
convex hull peeling,
density estimation,
DR6,
massive data,
multivariate analysis,
nonparametric,
SDSS,
SQL,
voronoi tessellation Category:
Algorithms,
arXiv,
Astro,
Data Processing,
Misc,
Optical |
1 Comment