Welcome Guest
Login
Mediasite Catalog
Version 5.5.1
© 2010 Sonic Foundry, all rights reserved.
UBC Reproducible Research Workshop
Link to catalog
Paste link in email or IM
Paste HTML to embed in website
July 14th and 15th, 2011 Event
Previous Page Page:  of 1 Next Page
Search Options
Displaying presentations 1 to 14 of 14
Sort By:  Name  |  Date
 |  Presenter  |  Type
  From Date Calendar     To Date Calendar      
John Wilbanks, Vice President  of the Science Project - Creative Commons
Freedom (to reproduce) - John Wilbanks View Description
Much of the debate over data availability in science has been dominated by the frame of "open or closed" that comes to us from free and open source software, free culture, and other sharing cultures created by the reaction to overly broad copyright. But in data, the problems are fare more complex, and framing the debate in terms of reproducibility is far more likely to actually get scientific data online than appealing to scientists to "share openly". There are emerging best practices based on the public domain for data, as well as infrastructure for reuse of data, but the greatest emerging threat is not enclosure of the digital commons by copyright or database rights, but the impact of privacy regulations that bar gathering of data in the first place without significant oversight and in the second place ban its redistribution for reproduction and replication.
  • John Wilbanks, Vice President of the Science Project - Creative Commons
On Demand
Thursday, July 14, 2011, 8:30 AM PDT
39 Minutes 3 Seconds
Andrew P. Davison, Research Scientist - Unite de Neuroscience, Information et Complexite (UNIC) - CNRS - France
Automated tracking of scientific computations - Andrew Davison View Description
Reproducibility of experiments is one of the foundation stones of science. A related concept is provenance, being able to track a given scientific result, such as a figure in an article, back through all the analysis steps (verifying the correctness of each) to the original raw data, and the experimental protocol used to obtain it. In computational, simulation- or numerical analysis-based science, reproduction of previous experiments, and establishing the provenance of results, ought to be easy, given that computers are deterministic, not suffering from the problems of inter-subject and trial-to-trial variability that make reproduction of biological experiments, for example, more challenging. (see abstract for more details: http://www.stodden.net/AMP2011/)

To ensure reproducibility of a computational experiment we need to record: (i) the code that was run, (ii) any parameter files and command line options, (iii) the platform on which the code was run, (iv) the outputs. To keep track of a research project with many hundreds or thousands of simulations and/or analyses, it is also useful to record (i) the reason for which the simulation/analysis was run and (ii) a summary of the outcome of the simulation/analysis. Recording the code might mean storing a copy of the executable, or the source code (including that of any libraries used), the compiler used (including version) and the compilation procedure (e.g. the Makefile, etc.) see abstract for more details...
In this talk I will present the solution we are developing to the challenges outlined above. Sumatra consists of a core library, implemented in Python, on which is built a command line interface for launching simulations/analyses with automated recording of provenance information and a web interface for managing a computational project: browsing, viewing, and annotating simulations/analyses.

Sumatra (i) interacts with version control systems, such as Subversion, Git, Mercurial, or Bazaar, (ii) supports launching serial or distributed (via MPI) computations, (iii) links to data generated by the computation, (iv) aims to support all and any command-line drivable simulation or analysis program, (v) supports both local and networked storage of information, (vi) aims to be extensible, so that components can easily be added for new version control systems, etc., (vii) aims to be very easy to use, otherwise it will only be used by the very conscientious.
  • Andrew P. Davison, Research Scientist - Unite de Neuroscience, Information et Complexite (UNIC) - CNRS - France
On Demand
Thursday, July 14, 2011, 9:15 AM PDT
40 Minutes 35 Seconds
Roger D. Peng, Associate Professor - Dept. of Biostatistics - Johns Hopkins Bloomberg School of Public Health
Computational and Policy Tools for Reproducible Research - Roger Peng View Description
The ability to make scientific findings reproducible is increasingly important in areas where substantive results are the product of complex statistical computations. Reproducibility can allow others to verify the published findings and conduct alternate analyses of the same data. A question that arises naturally is how can one conduct and distribute reproducible research? I describe a simple framework in which reproducible research can be conducted and distributed via cached computations and describe tools for both authors and readers. As a prototype implementation I describe a software package written in the R language. The `cacher' package provides tools for caching computational results in a key-value style database which can be published to a public repository for readers to download. As a case study I demonstrate the use of the package on a study of ambient air pollution exposure and mortality in the United States. I will also discuss the role that journals can play in encouraging reproducible research and will review the recent reproducibility policy at the journal Biostatistics.
  • Roger D. Peng, Associate Professor - Dept. of Biostatistics - Johns Hopkins Bloomberg School of Public Health
On Demand
Thursday, July 14, 2011, 10:30 AM PDT
40 Minutes 46 Seconds
Juliana Freire, Associate Professor - Scientific Computing and Imaging Institute and School of Computing - U of Utah
A Provenance-Based Infrastructure for Creating Reproducible Papers - Juliana Freire and Claudio Silva View Description
While computational experiments have become an integral part of the scientific method, it is still a challenge to repeat such experiments, because often, computational experiments require specific hardware, non-trivial software installation, and complex manipulations to obtain results. Generating and sharing repeatable results takes a lot of work with current tools. Thus, a crucial technical challenge is to make this easier for (i) the author of the paper, (ii) the reviewer of the paper, and, if the author is willing to disseminate code to the community, (iii) the eventual readers of the paper. While a number of tools have been developed that attack sub-problems related to the creation of reproducible papers, no end-to-end solution is available. Besides giving authors the ability to link results to their provenance, such a solution should enable reviewers to assess the correctness and the relevance of the experimental results described in a submitted paper. Furthermore, upon publication, readers should be able to repeat and utilize the computations embedded in the papers. But even when the provenance associated with a result is available and contains a precise and executable specification of the computational process (i.e., a workflow), shipping the specification to be run in an environment different from the one it has been designed at raises many challenges. From hard-coded locations for input data, to dependencies on specific version of software libraries and hardware, adapting a workflow to run on a new environment can be challenging and sometimes impossible.
We posit that integrating data acquisition, derivation, analysis, and visualization as executable components throughout the publication process will make it easier to generate and share repeatable results. To this end, we have built an infrastructure to support the life-cycle of 'reproducible publications'---their creation, review and re-use. In particular, in our design we have considered the following desiderata: Lower Barrier for Adoption---it should help authors in the process of assembling their submissions; Flexibility---it should support multiple mechanisms that give authors different choices as how to package their work; Support for the Reviewing Process---reviewers should be able to unpack and reproduce the experiments, as well as validate them. We have used VisTrails?, a provenance-enabled, workflow-based data exploration tool, as a key component of our infrastructure. We leverage the VisTrails?' provenance infrastructure to systematically capture useful meta-data, including workflow provenance, source code, and library versions. We have also taken advantage of the extensibility of the system to integrate components and tools that address issues required to support reproducible papers, including: linking results to their provenance; the ability to repeat results, explore parameter spaces, and interact with results through a Web-based interface; the ability to upgrade the specification of computational experiments to work in different environments and with newer versions of software. In this talk, we outline challenges we have encountered and present some of the components we have developed to address them. We also present a demo where we show real-world uses of our infrastructure.
  • Juliana Freire, Associate Professor - Scientific Computing and Imaging Institute and School of Computing - U of Utah
  • Claudio T Silva BSc, Ph.D Math., Professor - Scientific Computing and Imaging Institute - U of Utah
On Demand
Thursday, July 14, 2011, 11:15 AM PDT
47 Minutes 3 Seconds
Philip J. Guo, Ph.D Student - Computer Science - Stanford University
CDE: A tool for automatically creating reproducible experimental software packages - Philip Guo View Description
Although there are many social, cultural, and political barriers to reproducible research, the main technical barrier to reproducibility is that it is hard to distribute scientific code in a form that other researchers can easily execute on their own machines. Before your colleagues can run your computational experiments, they must first obtain, install, and configure compatible versions of the appropriate software and their myriad of dependent libraries, which is a frustrating and error-prone process.
To eliminate this technical barrier to reproducibility, I have created a tool called CDE that automatically packages up all of the software dependencies required to reproduce your computational experiments on another machine. CDE is easy to use: All you need to do is execute the commands for your experiment under its supervision, and CDE automatically packages up all of the Code, Data, and Environment that your commands accessed. When you send that self-contained package to your colleagues, they can re-run those exact commands on their machines without first installing or configuring anything. Moreover, they can even adjust the parameters in your code and re-run to explore related hypotheses, or run your code on their own datasets to see how well it generalizes.

CDE currently only works on Linux, but the ideas it embodies can be implemented for any operating system. You can download CDE for free at http://www.stanford.edu/~pgbovine/cde.html.
  • Philip J. Guo, Ph.D Student - Computer Science - Stanford University
On Demand
Thursday, July 14, 2011, 1:30 PM PDT
44 Minutes 20 Seconds
Patrick Vandewalle, Researcher - Author - Signal Image Processing
Reproducible Research in Signal Processing: How to Increase Impact - Patrick Vandewalle View Description
Worries about the reproducibility of research results are centuries old, and date back to Descartes' work Discourse on (Scientific) Method. However, in the recently developed computational sciences, new approaches to reproducibility are required. In this presentation, I give an overview of our personal experiences with reproducible research in the field of signal and image processing. I will also present results from the reproducibility study that we did on image processing papers. Next, I discuss some of the typical issues we ran into when making our work reproducible. Finally, I give some indications of the increased impact of research results when they are made reproducible.
  • Patrick Vandewalle, Researcher - Author - Signal Image Processing
On Demand
Thursday, July 14, 2011, 2:15 PM PDT
41 Minutes 29 Seconds
Tiffani L. Williams Ph.D C.Sc., Assistant Professor - Computer Science and Engineering - Texas A&M Engineering
Paper Mâché : A Novel System for Executing Scientific Papers - Tiffani L Williams View Description
The increased use of computer software in science makes reproducing scientific results increasingly difficult. The research paper in its current state is no longer sufficient to fully reproduce, validate, or review a paper's experimental results and conclusions. We introduce Paper~M\^{a}ch{e}, a new system for creating dynamic, executable research papers. The key novelty of our system is the use of virtual machines, which allows scientists to view and interact with a paper and reproduce key experimental results. Thus, our system provides a bridge that allows everyone to actively participate in the scientific process.
  • Tiffani L. Williams Ph.D C.Sc., Assistant Professor - Computer Science and Engineering - Texas A&M Engineering
On Demand
Thursday, July 14, 2011, 3:30 PM PDT
37 Minutes 31 Seconds
Tony Hey, Corporate Vice President of Microsoft Research Connections
Reproducible Research and Data-Intensive Scientific Discovery - Tony Hey View Description
There is a sea change happening in academic research -- a transformation caused by a data deluge that is affecting all disciplines. Modern science increasingly relies on integrated information technologies and computation to collect, process, and analyze complex data. It was Ken Wilson, Nobel Prize winner in physics, who first coined the phrase “Third Paradigm” to refer to computational science and the need for computational researchers to know about algorithms, numerical methods, and parallel architectures. However, the skills needed for manipulating, visualizing, managing, and, finally, conserving and archiving scientific data are very different. “The Fourth Paradigm” is about the computational systems needed to manipulate, visualize, and manage large amounts of scientific data. A wide variety of scientists— biologists, chemists, physicists, astronomers, engineers – require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies and processes. One disturbing emerging trend is the difficulty in enabling scientists other than the authors of scientific papers to be able to replicate the often complex analysis steps required to reach the scientific conclusions of the papers. The talk will illustrate a possible partial solution to the problem of reproducible research based on a joint research project between Microsoft Research and the MIT Broad Institute.
  • Tony Hey, Corporate Vice President of Microsoft Research Connections
On Demand
Friday, July 15, 2011, 8:30 AM PDT
51 Minutes 3 Seconds
Matan Gavish, Stanford University - Department of Statistics /  Yale University Applied Math Program -
A Universal Identifier for Computational Results - Matan Gavish View Description
When we read online scientific publications, thanks to the notion of hyperlink and the infrastructure of the web, we can click on a citation and browse the cited work, and continue to work it itself cites, and so on.
What if we could click on any image or table in a scientific publication, and go to a detailed, structured description of its generating computation, and even land precisely on the instruction that created the figure? From there we could stroll up and down the computation tree and browse other parts of the same computation and other figures created by it. We could read the code, examine intermediate variables, understand the steps that took place, re-execute some parts, and retrieve the original dataset fed into the computation. In fact, what if we could move on to browse the original dataset's own creating computations, and continue to tour computation world, all through an entry point provided by one figure of interest in a publication?

The Verifiable Computational Research (VCR) is a discipline for computational research. It introduces the notion of Verifiable Result Identifier (VRI), which together with today's advanced web infrastructure turns the above fantasy into a reality. The discipline allows researchers, publishers and publication readers to use the same tools they are already using, and requires only minor changes to these tools. While everyone are following their familiar workflow, a VCR software system is working quietly in the background to make it all happen. For example, it automatically brands each publishable result produced by the computation with its own unique VRI. The VRI is at the same time a web URL and a secure digital signature. In any publication or presentation, the interested reader can click on the result, direct a web browser to the URL, or scan a barcode -- and gain entry into the computation that created the result, with implications as above.

For the individual researcher, VCR is the an online, self-filling lab journal for computational experiments. For computational science communities, VCR is a disciplined, standard, simple and automatic way to work reproducibly. It imposes simple rules, requires very minimal effort (the software is doing all the work), and needs absolutely no personal dedication to the reproducible research cause. As such, it might be a big step toward the long-anticipated promises of widely practiced, fully reproducible research.

I'll show an existing implementation of the VCR system, currently in use in the Stanford Statistics Department.

Joint work with D. Donoho.
  • Matan Gavish, Stanford University - Department of Statistics / Yale University Applied Math Program -
On Demand
Friday, July 15, 2011, 9:25 AM PDT
46 Minutes 35 Seconds
Bill Howe, Senior Scientist -  eScience Institute / Affiliate Assistant Professor - Dept. of Computer Science and Eng. U of Washington
Virtual Appliances, Cloud Computing, and Reproducible Research - Bill Howe View Description
Science in every discipline is becoming data-intensive, requiring researchers to interact with their data solely through computational and statistical methods as opposed to direct manipulation. Perhaps paradoxically, these in silico experiments are often more difficult to reproduce than traditional "manual" laboratory techniques. Software pipelines used to acquire and process data have complex version-sensitive interdependencies, datasets are too large to efficiently transport from place to place, and interfaces are often complex and underdocumented.
At the UW eScience Institute, we are exploring the use of virtual machines and cloud computing to mitigate these challenges. A virtual machine can capture a researcher's entire working environment as a snapshot, including the data, software, dependencies, intermediate results, logs and other usage history information, operating system and file system context, convenience scripts, and more. These virtual machines can then be saved, made publicly available, and referenced in a publication. This approach not only facilitates reproducibility, but incurs essentially zero overhead for the researcher. Coupled with cloud computing, this approach offers additional benefits: experimenters need not allocate local resources to host the virtual machine, large datasets and long-running computations can be managed efficiently, and resource costs are more easily shared between producer and consumer.

In this talk, I motivate this approach with case studies from our experience and consider some of the implications and future directions.
  • Bill Howe, Senior Scientist - eScience Institute / Affiliate Assistant Professor - Dept. of Computer Science and Eng. U of Washington
On Demand
Friday, July 15, 2011, 10:45 AM PDT
49 Minutes 36 Seconds
James J Quirk, AMRITA -Computational Fluid Dynamics
In Search of Computational Scholarship: Reproducible Research and Cotton Nero A.X. - James Quirk View Description
December 2010, the publishing behemoth Elsevier issued an Executable Paper Grand Challenge: ``a contest created to improve the way scientific information is communicated and used.'' This contest, along with the reproducible research movement, represents a growing realization that computational science is ill-served by traditional journal-articles, with their static typeset text. In this talk I will present some of my executable-paper exploits.
The talk's title is borrowed from a television documentary by my near namesake, James Burke, the noted science historian. He argues that ``you see, what your knowledge tells you, you're seeing.'' And that when your knowledge changes, so your view of the universe changes. Thus my take on executable papers stems from many small dawnings, rather than a one-off Eureka!

December 1984, for instance, while working in the design department of a manufacturer of steam turbines, I received a severe dressing down for a slipshod calculation I had performed. As a result, I view executable-papers through a prism of accountability. One which forces me to discuss my exploits through the very framework I use to create executable papers. That way you can examine the associated software details, first-hand, and you do not need to take my word on trust.
Here is an executable PDF version of this abstract: http://www.amrita-ebook.org/doc/amp/2011
  • James J Quirk, AMRITA -Computational Fluid Dynamics
On Demand
Friday, July 15, 2011, 11:30 AM PDT
38 Minutes 24 Seconds
Sorin Mitran, Department of Mathematics, University of North Carolina.
Archiving Computational Research in Virtual Machines - Sorin Mitran View Description
Several approaches have been taken by computational scientists to ensure open access to their research codes: providing source codes, using a purpose-built archival system, literate programming tools. These procedures reflect standard practices in experimental sciences where laboratory techniques, supplies and equipment are documented in a research paper. Computational research has one advantage with respect to experimental science: our entire laboratory can be packaged and sent to independent parties for validation of research results. Virtualization has advanced to a stage in which direct access to graphics processing hardware and multiple CPU parallel processing can be included in virtual machines. The entire panoply of open-source tools for scripting and documentation can be included with the virtual machine.

This talk will present experience with this approach in the context of interdisciplinary research that uses two of the author's codes (BEARCLAW and Diapason). Particular attention is paid to documentation and use of the TeXmacs environment to present both theory and implementation of algorithmic ideas.
  • Sorin Mitran, Department of Mathematics, University of North Carolina.
On Demand
Friday, July 15, 2011, 1:30 PM PDT
43 Minutes 22 Seconds
Jarrod Millman, Researcher - U. of California, Berkeley's Brain Imaging Centre
The challenge of reproducible research in the computer age - Jarrod Millman View Description
Computing is increasingly central to the practice of mathematical and scientific research. This has provided many new opportunities as well as new challenges. In particular, modern scientific computing has strained the ability of researchers to reproduce their own (as well as their colleagues') work. In this talk, I will outline some of the obstacles to reproducible research as well as some potential solutions and opportunities.
  • Jarrod Millman, Researcher - U. of California, Berkeley's Brain Imaging Centre
On Demand
Friday, July 15, 2011, 2:30 PM PDT
33 Minutes 15 Seconds
Victoria Stodden, Assistant Professor - Dept. of Statistics - Columbia University
What is Reproducible Research? The Practice of Science Today and the Scientific Method - Victoria Stodden View Description
Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis in many scientific fields. It is impossible to verify most of the results that computational scientists present at conferences and in papers today. Computational science is error-prone and traditional scientific publication is incapable of finding and rooting out errors in scientific computation.
A necessary response to this crisis is reproducible research -- where all code and data underlying the published results is made openly available. In this talk I discuss the evolution of the practice of science and necessary corresponding changes in the scientific method, such as reproducibility. I discuss open licensing to facilitate code and data sharing and reuse, called the Reproducible Research Standard.
  • Victoria Stodden, Assistant Professor - Dept. of Statistics - Columbia University
On Demand
Friday, July 15, 2011, 3:30 PM PDT
58 Minutes 44 Seconds
Publish Presentation
The publish to go package for this presentation has not been created. Click the "Submit" button to start the creation process.
The publish to go package is currently being created. The process should only take a few minutes. You can wait on this dialog or click the "Close" button and come back in a few minutes.

The publish to go package has been created. Click "Download" to proceed.

Download
An error occurred while creating the publish to go package. Check the error logs or click the "Submit" button to start the creation process again.