Tag Archives: Computational science

Dagstuhl Perspectives Workshop on Artifact Evaluation for Publications

I’m pleased to have been invited to a Dagstuhl Perspectives Workshop in November on “Artifact Evaluation for Publications”, in recognition of my work (with colleagues) on computational reproducibility and software sustainability.

Schloss Dagstuhl, Leibniz-Zentrum für Informatik GmbH (Schloss Dagstuhl, Leibniz Center for Informatics) is the world’s premier venue for informatics; the center promotes fundamental and applied research, continuing and advanced academic education, and the transfer of knowledge between those involved in the research side and application side of informatics. The aim of their Seminar and Perspectives Workshop series is to bring together internationally renowned leading scientists for the purpose of exploring a cutting-edge informatics topic; in this case how we can define a roadmap for artifact evaluation in computer systems research (with application more widely across computational science and engineering), defining an actionable research roadmap for increased accountability, rethinking how we evaluate research outputs (particularly software) and document research processes and associated e-infrastructure, as well as how best to change culture and behaviour — and perhaps more importantly, incentivisation structures — for researchers, institutions and governments:

The computer systems research (CSR) community has developed numerous artifacts that encompass a rich and diverse collection of compilers, simulators, analyzers, benchmarks, data sets and other software and data. These artifacts are used to implement research innovations, evaluate trade-offs and analyze implications. Unfortunately, the evaluation methods used for computing systems innovation can be at odds with sound science and engineering practice. In particular, ever-increasing competitiveness and expediency to publish more results poses an impediment to accountability, which is key to the scientific and engineering process. Experimental results are not typically distributed with enough information for repeatability and/or reproducibility to enable comparisons and building on the innovation. Efforts in programming languages/compilers and software engineering, computer architecture, and high-performance computing are underway to address this challenge.

This Dagstuhl Perspectives Workshop brings together leaders of these efforts and senior stakeholders of CSR sub-communities to determine synergies and to identify the promising directions and mechanisms to move the broader community toward accountability. The workshop assesses current efforts, shares what does and doesn’t work, identifies additional processes, incentives and mechanisms, and determines how to coordinate and sustain the efforts. The workshop’s outcome is a roadmap of actionable strategies and steps to improving accountability, leveraging investment of multiple groups, educating the community on accountability, and sharing artifacts and experiments.

Organised by Bruce R. Childers (University of Pittsburgh, USA), Grigori Fursin (cTuning, France), Shriram Krishnamurthi (Brown University, USA) and Andreas Zeller (Universität des Saarlandes, Germany), Dagstuhl Perspectives Workshop 15452 takes place from 1-4 November 2015 (see the full list of invited attendees); looking forward to reporting back in November.

Tagged , , , ,

New paper: “Top Tips to Make Your Research Irreproducible”

It is an unfortunate convention of science that research should pretend to be reproducible; we have noticed (and contributed to) a number of manifestos, guides and top tips on how to make research reproducible, but we have seen very little published on how to make research irreproducible.

Irreproducibility is the default setting for all of science, and irreproducible research is particularly common across the computational sciences (for example, here and here). The study of making your work irreproducible without reviewers complaining is a much neglected area; we feel therefore that by encapsulating our top tips on irreproducibility, we will be filling a much-needed gap in the domain literature. By following our tips, you can ensure that if your work is wrong, nobody will be able to check it; if it is correct, you can make everyone else do disproportionately more work than you to build upon it. Our top tips will also help you salve the conscience of certain reviewers still bound by the fussy conventionality of reproducibility, enabling them to enthusiastically recommend acceptance of your irreproducible work. In either case you are the beneficiary.

  1. Think “Big Picture”. People are interested in the science, not the experimental setup, so don’t describe it.
  2. Be abstract. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details that actually make it work.
  3. Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit.
  4. The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with.
  5. Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t work at all.

Read the full version of our high-impact paper on arXiv.

Tagged , , , ,

The many Rs of e-Research


The 6 12 many Rs of e-Research…what else could/should we add to this (especially in the context of research objects and supporting reproducible research)?

Tagged , , , ,

Reproducibility-as-a-service: can the cloud make it real?

Kenji Takeda, Solutions Architect and Technical Manager with Microsoft Research, has written a blog post on Recomputability 2014, as well as discussing some of the issues (and potential opportunities) for reproducibility in computational science we have outlined in our joint paper (including a quote from me):

This is an exciting area of research and one that could have a profound impact on the way that computational science is performed. By rethinking how we develop, use, benchmark, and share algorithms, software, and models, alongside the development of integrated and automated e-infrastructure to support recomputability and reproducibility, we will be able to improve the efficiency of scientific exploration as well as promoting open and verifiable scientific research.

Read Kenji’s full post on the Microsoft Research Connections Blog.

Tagged , , , , , ,

It’s impossible to conduct research without software

No one knows how much software is used in research. Look around any lab and you’ll see software — both standard and bespoke — being used by all disciplines and seniorities of researchers. Software is clearly fundamental to research, but we can’t prove this without evidence. And this lack of evidence is the reason why we ran a survey of researchers at 15 Russell Group universities to find out about their software use and background.

The Software Sustainability Institute‘s recent survey of researchers at research-intensive UK universities is out. Headlines figures:

  • 92% of academics use research software;
  • 69% say that their research would not be practical without it;
  • 56% develop their own software (worryingly, 21% have no training in software development);
  • 70% of male researchers develop their own software, and only 30% of female researchers do.

For the full story, see the SSI blog post; the survey results described there are based on the responses of 417 researchers selected at random from 15 Russell Group universities, with good representation from across the disciplines, genders and career grades. It represents a statistically significant number of responses that can be used to represent, at the very least, the views of people in research-intensive universities in the UK (the data collected from the survey is available for download and is licensed under a Creative Commons by Attribution licence).

(you may also like to sign this petition and join the UK Community of Research Software Engineers)

Tagged , ,

Accepted papers and programme for Recomputability 2014

I am co-chairing Recomputability 2014 next week, an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). The final workshop programme is now available and it will take place on Thursday 11 December in the Hobart Room at the Hilton London Paddington hotel.

I will also be presenting our paper on sharing and publishing scientific models (arXiv), as well as chairing a panel session on the next steps for recomputability and reproducibility; I look forward to sharing some of the outcomes of this workshop over the next few weeks.

The workshop Twitter hashtag is #recomp14; you can also follow the workshop co-chairs: @DrTomCrick and @npch, as well as the main UCC account: @UCC2014_London.

Tagged , , , , ,

Paper submitted to Recomputability 2014: “Share and Enjoy”: Publishing Useful and Usable Scientific Models

Last month, me, Ben Hall, Samin Ishtiaq and Kenji Takeda (all Microsoft Research) submitted a paper to Recomputability 2014, to be held in conjunction with the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014) in London in December. This workshop is an interdisciplinary forum for academic and industrial researchers, practitioners and developers to discuss challenges, ideas, policy and practical experience in reproducibility, recomputation, reusability and reliability across utility and cloud computing. It aims to provide an opportunity to share and showcase best practice, as well as to offering a platform to further develop policy, initiatives and practical techniques for researchers in this domain.

In our paper, we discuss a number of issues in this space, proposing a new open platform for the sharing and reuse of scientific models and benchmarks. You can download our arXiv pre-print; the abstract is as follows:

The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing data and models often requires extra effort, and this is currently seen as a significant overhead that may not be worth the time investment.

Automated systems, which allow easy reproduction of results, offer the potential to incentivise a culture change and drive the adoption of new techniques to improve the efficiency of scientific exploration. In this paper, we discuss the value of improved access and sharing of the two key types of results arising from work done in the computational sciences: models and algorithms. We propose the development of an integrated cloud-based system underpinning computational science, linking together software and data repositories, toolchains, workflows and outputs, providing a seamless automated infrastructure for the verification and validation of scientific models and in particular, performance benchmarks.

(see GitHub repo)

Tagged , , , , , ,

Paper submitted to WSSSPE2: “Can I Implement Your Algorithm?”: A Model for Reproducible Research Software

Yesterday, me, Ben Hall and Samin Ishtiaq (both Microsoft Research Cambridge) submitted a paper to WSSSPE2, the 2nd Workshop on Sustainable Software for Science: Practice and Experiences to be held in conjunction with SC14 in New Orleans in November. As per the aims of the workshop: progress in scientific research is dependent on the quality and accessibility of software at all levels and it is critical to address challenges related to the development, deployment and maintenance of reusable software as well as education around software practices.

As discussed in our paper, we feel this multitude of research software engineering problems are not just manifest in computer science, but also across the computational science and engineering domains (particularly with regards to benchmarking and availability of code). We highlight a number of recommendations to address these issues, as well as proposing a new open platform for scientific software development. You can download our arXiv pre-print; the abstract is as follows:

The reproduction and replication of novel scientific results has become a major issue for a number of disciplines. In computer science and related disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it in a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair, and widely available benchmark sets present another barrier. In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.

(see GitHub repo)

Tagged , , , , , ,

Call for Papers: Recomputability 2014

I am co-chairing Recomputability 2014, the first workshop to focus explicitly on recomputability and reproducibility in the context of utility and cloud computing and is open to all members of the cloud, big data, grid, cluster computing and open science communities. Recomputability 2014 is an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014), to be held in London in December 2014.

Recomputability 2014 will provide an interdisciplinary forum for academic and industrial researchers, practitioners and developers to discuss challenges, ideas, policy and practical experience in reproducibility, recomputation, reusability and reliability across utility and cloud computing. It will provide an opportunity to share and showcase best practice, as well as to provide a platform to further develop policy, initiatives and practical techniques for researchers in this domain. Participation by early career researchers is strongly encouraged.

Proposed topics of interest include (but are not limited to):

  • infrastructure, tools and environments for recomputabilty and reproducibility in the cloud;
  • recomputability for virtual machines;
  • virtual machines as self-contained research objects or demonstrators;
  • describing and cataloging cloud setups;
  • the role of community/open access experimental frameworks and repositories for virtual machines and data, their operation and sustainability;
  • validation and verification of experimental results by the community;
  • sharing and publication issues;
  • recommending policy changes for recomputability and reproducibility;
  • improving education and training: best practice, novel uses, case studies;
  • encouraging industry’s role in recomputability and reproducibility.

Please see the full call for papers; deadline for submissions (online via EasyChair) is 10 August 2014 17 August 2014.

Tagged , , , , , , ,

Ten Simple Rules for Reproducible Computational Research

In a paper published last week in PLoS Computational Biology, Sandve, Nekrutenko, Taylor and Hovig highlight the issue of replication across the computational sciences. The dependence on software libraries, APIs and toolchains, coupled with massive amounts of data, interdisciplinary approaches and the increasing complexity of the questions being asked are complicating replication efforts.

To address this, they present ten simple rules for reproducibility of computational research:

Rule 1: For Every Result, Keep Track of How It Was Produced

Rule 2: Avoid Manual Data Manipulation Steps

Rule 3: Archive the Exact Versions of All External Programs Used

Rule 4: Version Control All Custom Scripts

Rule 5: Record All Intermediate Results, When Possible in Standardized Formats

Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds

Rule 7: Always Store Raw Data behind Plots

Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected

Rule 9: Connect Textual Statements to Underlying Results

Rule 10: Provide Public Access to Scripts, Runs, and Results

The rationale underpinning these rules clearly resonates with the work of the Software Sustainability Institute: better science through superior software. Based at the universities of Edinburgh, Manchester, Oxford and Southampton, it is a national facility for cultivating world-class research through software (for example, Software Carpentry). An article that caught my eye in July was the Recomputation Manifesto: computational experiments should be recomputable for all time. In light of the wider open data and open science agenda, should we also be thinking about open software and open computation?

Tagged , , , , , , , ,