All models are wrong, but some are useful

All models are wrong, but some are useful. –George Box

One of the most frequently cited quotes appeared in an article, titled The End of Theory: The Data Deluge Makes the Scientific Method Obsolete which I liked it very much because it cited the updated maxim by Peter Norvig, Google’s research director,

All models are wrong, and increasingly you can succeed without them.

The article addressed perspectives of the new Petabyte data analysis era, where the traditional modeling and testing are not likely feasible.

I’d like to thank the person who forwarded this article. However, I have no intention of advertising the company in the article by your click and reading. At least, I’d like to urge that we need more innovative thinkings than what we normally do with small data sets described by the author, Chris Anderson:

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

I cannot put it in an elegant fashion but simply, the data analysis should be directed by listening data and letting data talk to you, instead of framing models onto data (particularly when the data set is large or humongous; good a priori knowledge might be an exception but we never had enough where disputes of errors come in).

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

  1. vlk:

    Allow me to disagree vehemently with the premise of that article. Correlation is never enough. Without theory and modeling, all you get is stamp collecting.

    07-01-2008, 2:19 pm
  2. hlee:

    nice quote and I agree but stamp collecting is a very elaborative work once your collection goes beyond one binding. If stamp collecting is confined to a chronological assembly, I have nothing to say. Conversely, consider association rules for decision making. We need some relationships to lay out products in a store by looking at purchase patterns. I think correlation here has a more broad sense than astronomers’ perception (laying a straight line on a scatter plot with error bars to draw a coherent relationship). The quote belongs to the past, not the future that asks clusters of computers to handle data.

    07-01-2008, 8:27 pm
  3. brianISU:

    I find it interesting that the guy is mentioning that models are becoming obsolete, yet he keeps mentioning the use of statistical method. How are statistical methods model free? Even nonparametric must make some assumptions. It seems to me that this essay is more about how models aren’t becoming obsolete, but are evolving to a new stage.

    07-08-2008, 9:32 am
Leave a comment