New paper: “Top Tips to Make Your Research Irreproducible”

It is an unfortunate convention of science that research should pretend to be reproducible; we have noticed (and contributed to) a number of manifestos, guides and top tips on how to make research reproducible, but we have seen very little published on how to make research irreproducible.

Irreproducibility is the default setting for all of science, and irreproducible research is particularly common across the computational sciences (for example, here and here). The study of making your work irreproducible without reviewers complaining is a much neglected area; we feel therefore that by encapsulating our top tips on irreproducibility, we will be filling a much-needed gap in the domain literature. By following our tips, you can ensure that if your work is wrong, nobody will be able to check it; if it is correct, you can make everyone else do disproportionately more work than you to build upon it. Our top tips will also help you salve the conscience of certain reviewers still bound by the fussy conventionality of reproducibility, enabling them to enthusiastically recommend acceptance of your irreproducible work. In either case you are the beneficiary.

  1. Think “Big Picture”. People are interested in the science, not the experimental setup, so don’t describe it.
  2. Be abstract. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details that actually make it work.
  3. Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit.
  4. The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with.
  5. Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t work at all.

Read the full version of our high-impact paper on arXiv.

Paper submitted to CAV 2015: “Dear CAV, We Need to Talk About Reproducibility”

Today, me, Ben Hall (Cambridge) and Samin Ishtiaq (Microsoft Research) submitted a paper to CAV 2015, the 27th International Conference on Computer Aided Verification, to be held in San Francisco in July. CAV is dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems; the conference covers the spectrum from theoretical results to concrete applications, with an emphasis on practical verification tools and the algorithms and techniques that are needed for their implementation.

In this paper we build upon our recent work, highlighting a number of key issues relating to reproducibility and how they impact on the CAV (and wider computer science) research community, proposing a new model and workflow to encourage, enable and enforce reproducibility in future instances of CAV. We applaud the CAV Artifact Evaluation process, but we need to do more. You can download our arXiv pre-print; the abstract is as follows:

How many times have you tried to re-implement a past CAV tool paper, and failed?

Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper we propose an infrastructure for enabling reproducibility in our community, by automating the build, unit testing and benchmarking of research software.

(also see: GitHub repo)

The many Rs of e-Research


The 6 12 many Rs of e-Research…what else could/should we add to this (especially in the context of research objects and supporting reproducible research)?

Reproducibility-as-a-service: can the cloud make it real?

Kenji Takeda, Solutions Architect and Technical Manager with Microsoft Research, has written a blog post on Recomputability 2014, as well as discussing some of the issues (and potential opportunities) for reproducibility in computational science we have outlined in our joint paper (including a quote from me):

This is an exciting area of research and one that could have a profound impact on the way that computational science is performed. By rethinking how we develop, use, benchmark, and share algorithms, software, and models, alongside the development of integrated and automated e-infrastructure to support recomputability and reproducibility, we will be able to improve the efficiency of scientific exploration as well as promoting open and verifiable scientific research.

Read Kenji’s full post on the Microsoft Research Connections Blog.

Accepted papers and programme for Recomputability 2014

I am co-chairing Recomputability 2014 next week, an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). The final workshop programme is now available and it will take place on Thursday 11 December in the Hobart Room at the Hilton London Paddington hotel.

I will also be presenting our paper on sharing and publishing scientific models (arXiv), as well as chairing a panel session on the next steps for recomputability and reproducibility; I look forward to sharing some of the outcomes of this workshop over the next few weeks.

The workshop Twitter hashtag is #recomp14; you can also follow the workshop co-chairs: @DrTomCrick and @npch, as well as the main UCC account: @UCC2014_London.

Call for Papers: Recomputability 2014

I am co-chairing Recomputability 2014, the first workshop to focus explicitly on recomputability and reproducibility in the context of utility and cloud computing and is open to all members of the cloud, big data, grid, cluster computing and open science communities. Recomputability 2014 is an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014), to be held in London in December 2014.

Recomputability 2014 will provide an interdisciplinary forum for academic and industrial researchers, practitioners and developers to discuss challenges, ideas, policy and practical experience in reproducibility, recomputation, reusability and reliability across utility and cloud computing. It will provide an opportunity to share and showcase best practice, as well as to provide a platform to further develop policy, initiatives and practical techniques for researchers in this domain. Participation by early career researchers is strongly encouraged.

Proposed topics of interest include (but are not limited to):

  • infrastructure, tools and environments for recomputabilty and reproducibility in the cloud;
  • recomputability for virtual machines;
  • virtual machines as self-contained research objects or demonstrators;
  • describing and cataloging cloud setups;
  • the role of community/open access experimental frameworks and repositories for virtual machines and data, their operation and sustainability;
  • validation and verification of experimental results by the community;
  • sharing and publication issues;
  • recommending policy changes for recomputability and reproducibility;
  • improving education and training: best practice, novel uses, case studies;
  • encouraging industry‚Äôs role in recomputability and reproducibility.

Please see the full call for papers; deadline for submissions (online via EasyChair) is 10 August 2014 17 August 2014.

2014 Software Sustainability Institute Fellowship


I’m delighted to have been named today as one of the sixteen Software Sustainability Institute Fellows for 2014.

The Software Sustainability Institute (SSI) is an EPSRC-funded project based at the universities of Edinburgh, Manchester, Oxford and Southampton, and draws on a team of experts with a breadth of experience in software development, project and programme management, research facilitation, publicity and community engagement. It’s a national facility for cultivating world-class research through software, whose goal is to make it easier to rely on software as a foundation of research; see their manifesto. The SSI works with researchers, developers, funders and infrastructure providers to identify the key issues and best practice surrounding scientific software.

During my fellowship, I’m particularly keen to work closely with Software Carpentry and Mozilla Science Lab to highlight the importance of software skills across the STEM disciplines. I’m also interested in a broader open science/open computation agenda; see the Recomputation Manifesto and the recently established recomputation.org project.

More to follow in 2014!

Ten Simple Rules for Reproducible Computational Research

In a paper published last week in PLoS Computational Biology, Sandve, Nekrutenko, Taylor and Hovig highlight the issue of replication across the computational sciences. The dependence on software libraries, APIs and toolchains, coupled with massive amounts of data, interdisciplinary approaches and the increasing complexity of the questions being asked are complicating replication efforts.

To address this, they present ten simple rules for reproducibility of computational research:

Rule 1: For Every Result, Keep Track of How It Was Produced

Rule 2: Avoid Manual Data Manipulation Steps

Rule 3: Archive the Exact Versions of All External Programs Used

Rule 4: Version Control All Custom Scripts

Rule 5: Record All Intermediate Results, When Possible in Standardized Formats

Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds

Rule 7: Always Store Raw Data behind Plots

Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected

Rule 9: Connect Textual Statements to Underlying Results

Rule 10: Provide Public Access to Scripts, Runs, and Results

The rationale underpinning these rules clearly resonates with the work of the Software Sustainability Institute: better science through superior software. Based at the universities of Edinburgh, Manchester, Oxford and Southampton, it is a national facility for cultivating world-class research through software (for example, Software Carpentry). An article that caught my eye in July was the Recomputation Manifesto: computational experiments should be recomputable for all time. In light of the wider open data and open science agenda, should we also be thinking about open software and open computation?

