Reproducible Research and Data-Intensive Scientific Discovery - Tony Hey
There is a sea change happening in academic research -- a transformation caused by a data deluge that is affecting all disciplines. Modern science increasingly relies on integrated information technologies and computation to collect, process, and analyze complex data. It was Ken Wilson, Nobel Prize winner in physics, who first coined the phrase “Third Paradigm” to refer to computational science and the need for computational researchers to know about algorithms, numerical methods, and parallel architectures. However, the skills needed for manipulating, visualizing, managing, and, finally, conserving and archiving scientific data are very different. “The Fourth Paradigm” is about the computational systems needed to manipulate, visualize, and manage large amounts of scientific data. A wide variety of scientists— biologists, chemists, physicists, astronomers, engineers – require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies and processes. One disturbing emerging trend is the difficulty in enabling scientists other than the authors of scientific papers to be able to replicate the often complex analysis steps required to reach the scientific conclusions of the papers. The talk will illustrate a possible partial solution to the problem of reproducible research based on a joint research project between Microsoft Research and the MIT Broad Institute.