A Universal Identifier for Computational Results - Matan Gavish
When we read online scientific publications, thanks to the notion of hyperlink and the infrastructure of the web, we can click on a citation and browse the cited work, and continue to work it itself cites, and so on.
What if we could click on any image or table in a scientific publication, and go to a detailed, structured description of its generating computation, and even land precisely on the instruction that created the figure? From there we could stroll up and down the computation tree and browse other parts of the same computation and other figures created by it. We could read the code, examine intermediate variables, understand the steps that took place, re-execute some parts, and retrieve the original dataset fed into the computation. In fact, what if we could move on to browse the original dataset's own creating computations, and continue to tour computation world, all through an entry point provided by one figure of interest in a publication?
The Verifiable Computational Research (VCR) is a discipline for computational research. It introduces the notion of Verifiable Result Identifier (VRI), which together with today's advanced web infrastructure turns the above fantasy into a reality. The discipline allows researchers, publishers and publication readers to use the same tools they are already using, and requires only minor changes to these tools. While everyone are following their familiar workflow, a VCR software system is working quietly in the background to make it all happen. For example, it automatically brands each publishable result produced by the computation with its own unique VRI. The VRI is at the same time a web URL and a secure digital signature. In any publication or presentation, the interested reader can click on the result, direct a web browser to the URL, or scan a barcode -- and gain entry into the computation that created the result, with implications as above.
For the individual researcher, VCR is the an online, self-filling lab journal for computational experiments. For computational science communities, VCR is a disciplined, standard, simple and automatic way to work reproducibly. It imposes simple rules, requires very minimal effort (the software is doing all the work), and needs absolutely no personal dedication to the reproducible research cause. As such, it might be a big step toward the long-anticipated promises of widely practiced, fully reproducible research.
I'll show an existing implementation of the VCR system, currently in use in the Stanford Statistics Department.
Joint work with D. Donoho.