Computational and Policy Tools for Reproducible Research - Roger Peng
The ability to make scientific findings reproducible is increasingly important in areas where substantive results are the product of complex statistical computations. Reproducibility can allow others to verify the published findings and conduct alternate analyses of the same data. A question that arises naturally is how can one conduct and distribute reproducible research? I describe a simple framework in which reproducible research can be conducted and distributed via cached computations and describe tools for both authors and readers. As a prototype implementation I describe a software package written in the R language. The `cacher' package provides tools for caching computational results in a key-value style database which can be published to a public repository for readers to download. As a case study I demonstrate the use of the package on a study of ambient air pollution exposure and mortality in the United States. I will also discuss the role that journals can play in encouraging reproducible research and will review the recent reproducibility policy at the journal Biostatistics.