In a paper published last week in PLoS Computational Biology, Sandve, Nekrutenko, Taylor and Hovig highlight the issue of replication across the computational sciences. The dependence on software libraries, APIs and toolchains, coupled with massive amounts of data, interdisciplinary approaches and the increasing complexity of the questions being asked are complicating replication efforts.
To address this, they present ten simple rules for reproducibility of computational research:
Rule 1: For Every Result, Keep Track of How It Was Produced
Rule 2: Avoid Manual Data Manipulation Steps
Rule 3: Archive the Exact Versions of All External Programs Used
Rule 4: Version Control All Custom Scripts
Rule 5: Record All Intermediate Results, When Possible in Standardized Formats
Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds
Rule 7: Always Store Raw Data behind Plots
Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
Rule 9: Connect Textual Statements to Underlying Results
Rule 10: Provide Public Access to Scripts, Runs, and Results
The rationale underpinning these rules clearly resonates with the work of the Software Sustainability Institute: better science through superior software. Based at the universities of Edinburgh, Manchester, Oxford and Southampton, it is a national facility for cultivating world-class research through software (for example, Software Carpentry). An article that caught my eye in July was the Recomputation Manifesto: computational experiments should be recomputable for all time. In light of the wider open data and open science agenda, should we also be thinking about open software and open computation?