Everything you wanted to know about power-laws but were afraid to ask

Clauset, Shalizi, & Newman (2007, arXiv/0706.1062) have a very detailed description of what power-law distributions are, how to recognize them, how to fit them, etc. They are also making available their matlab and R codes that they use to do the fitting and such.

Looks like a very handy reference text, though I am a bit uncertain about their use of the K-S test to check whether a dataset can be described with a power-law or not. It is probably fine; perhaps some statisticians would care to comment?

One Comment
  1. hlee:

    I completely forgot to comment on this post but my recent post on the Pareto distribution brought me back. Though it may not be a complete answer, I wanted to say that I learned applying the K-S test for the first time to test a homogeneous Poisson process from a spatial statistics class in a similar manner as this paper describes. Trials of 99 times of simulations allow to rank the K-S test stat from the data to determine a p-value. Instead of 99 times, for the accuracy, the paper suggests 2500 synthetic data sets.

    I’d like to second you that this is a very handy reference, particularly it may resolve concerns on the non-nested hypothesis testing (the slog post has the same reference, Vuong (1989) and other relevant ones) in astronomy.

    I dare to quote a line from their conclusion: The common practice of identifying and quantifying power-law distributions by the approximately straight-line behavior of a histogram on a doubly logarithmic plot is known to give biased results and should not be trusted. Appendix has its explanation and gives a second thought when fitting a straight line or even two straight lines connected at a breaking point (broken power laws).

    04-08-2008, 3:52 am
Leave a comment