The AstroStat Slog » petabytes http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 All models are wrong, but some are useful http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/ http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/#comments Tue, 01 Jul 2008 03:12:23 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=346

All models are wrong, but some are useful. –George Box


One of the most frequently cited quotes appeared in an article, titled The End of Theory: The Data Deluge Makes the Scientific Method Obsolete which I liked it very much because it cited the updated maxim by Peter Norvig, Google’s research director,

All models are wrong, and increasingly you can succeed without them.

The article addressed perspectives of the new Petabyte data analysis era, where the traditional modeling and testing are not likely feasible.

I’d like to thank the person who forwarded this article. However, I have no intention of advertising the company in the article by your click and reading. At least, I’d like to urge that we need more innovative thinkings than what we normally do with small data sets described by the author, Chris Anderson:

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

I cannot put it in an elegant fashion but simply, the data analysis should be directed by listening data and letting data talk to you, instead of framing models onto data (particularly when the data set is large or humongous; good a priori knowledge might be an exception but we never had enough where disputes of errors come in).

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/feed/ 3
working together to tackle hard problems in astronomy http://hea-www.harvard.edu/AstroStat/slog/2008/working-together-to-tackle-hard-problems-in-astronomy/ http://hea-www.harvard.edu/AstroStat/slog/2008/working-together-to-tackle-hard-problems-in-astronomy/#comments Fri, 01 Feb 2008 17:45:04 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2008/how-astronomers-computer-scientists-and-statisticians-are-working-together-to-tackle-hard-problems-in-astronomy/ This is an edited email copy of Colloquium Announcement from Tufts University, MA. A must go for those live in Medford and Somerville, where Tufts Univ. is located and its vicinity.

Subject : Special Joint CS and Physics Colloquium
Title : How Astronomers, Computer Scientists and Statisticians are working together to tackle hard problems in astronomy
Speaker: Pavlos Protopapas
Date : Thursday February 7
Time : 3:15 pm
Place : Nelson Auditorium, Anderson Hall (Click for the map, 200 College Ave, Medford, MA, I think)
Abstract:
New astronomical surveys such as Pan-STARRS and LSST are under development and will collect petabytes of data. These surveys will image large areas of sky repeatedly to great depth, and will detect vast numbers of moving, variably bright, and transient objects. The data product of these surveys is series of observations taken over time, or light-curves.

The IIC has established an inter-disciplinary Center for Time Series with an immediate focus on astronomy. I will present three research topics currently being pursued at the IIC that require expertise from astronomy, computer science and statistics. These are: identifying novel astronomical phenomena in large light-curve datasets, searching for rare phenomena such as extra-solar planets, and efficiently searching for significant events such as occultations of stars by small objects in the outer reaches of our solar system.

Pavlos Protopapas is a senior scientist at the IIC and Harvard-Smithsonian Center for Astrophysics. His research interests spans the outer solar system, extra-solar planets and gravitational lensing. He specializes in analyzing large collections of astronomical data, with a toolbox drawn from data-mining, computer science and statistics.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/working-together-to-tackle-hard-problems-in-astronomy/feed/ 0