Archive for the ‘arXiv’ Category.

Bend it like Poisson

I don’t know why astro-ph thought this article on the statistics of football dynamics (Mendes, Malacarne, Anteneodo 2007; physics/0706.1758) was relevant to me and emailed the abstract, but I’m glad they did, because they deal with a question I have wrestled with for a long time: how to figure out the underlying distribution that controls a stochastic process. In 2002ApJ…580.1118K, we dealt with modeling the photon arrival time differences as due to flares occuring at random times but with a power-law intensity distribution with index alpha. physics/0706.1758 deals with time-between-touches and tries to characterize that distribution itself in terms of a number of “phases” beta. From a quick reading, it appears that their beta are our flares, and they restrict all flares to have the same intensity. Despite the restriction, this is interesting because it is an analytical estimation that points a way towards speeding up our flare distribution fitting process, which currently is based on a Monte-Carlo grid search method, not the fastest way to do things.

Everything you wanted to know about power-laws but were afraid to ask

Clauset, Shalizi, & Newman (2007, arXiv/0706.1062) have a very detailed description of what power-law distributions are, how to recognize them, how to fit them, etc. They are also making available their matlab and R codes that they use to do the fitting and such.

Looks like a very handy reference text, though I am a bit uncertain about their use of the K-S test to check whether a dataset can be described with a power-law or not. It is probably fine; perhaps some statisticians would care to comment?

All your bias are belong to us

Leccardi & Molendi (2007) have a paper in A&A (astro-ph/0705.4199) discussing the biases in parameter estimation when spectral fitting is confronted with low counts data. Not surprisingly, they find that the bias is higher for lower counts, for standard chisq compared to C-stat, for grouped data compared to ungrouped. Peter Freeman talked about something like this at the 2003 X-ray Astronomy School at Wallops Island (pdf1, pdf2), and no doubt part of the problem also has to do with the (un)reliability of the fitting process when the chisq surface gets complicated.

Anyway, they propose an empirical method to reduce the bias by computing the probability distribution functions (pdfs) for various simulations, and then averaging the pdfs in groups of 3. Seems to work, for reasons that escape me completely.

[Update: links to Peter's slides corrected]

An excerpt from “A Conversation with Leo Breiman”

Leo Breiman (1928-2005) was one of the most dominant statisticians from the 20th century. He was well known for his textbook in probability theory as well as his contributions to the machine learning, such as CART (Classification and Regression Tree), bagging (bootstrap aggregation), and Random Forest. He was the founding father of statistical machine learning. His works can be found from http://www.stat.berkeley.edu/~breiman/

An excerpt from “A Conversation with Leo Breiman,” from Statistical Science, by Richard Olshen (2001), 16(2), pp. 184–198, casts a second thought on the direction of statistical researches:
Continue reading ‘An excerpt from “A Conversation with Leo Breiman”’ »