Tag Archives: Artifact evaluation

Dagstuhl Perspectives Workshop on Artifact Evaluation for Publications

I’m pleased to have been invited to a Dagstuhl Perspectives Workshop in November on “Artifact Evaluation for Publications”, in recognition of my work (with colleagues) on computational reproducibility and software sustainability.

Schloss Dagstuhl, Leibniz-Zentrum für Informatik GmbH (Schloss Dagstuhl, Leibniz Center for Informatics) is the world’s premier venue for informatics; the center promotes fundamental and applied research, continuing and advanced academic education, and the transfer of knowledge between those involved in the research side and application side of informatics. The aim of their Seminar and Perspectives Workshop series is to bring together internationally renowned leading scientists for the purpose of exploring a cutting-edge informatics topic; in this case how we can define a roadmap for artifact evaluation in computer systems research (with application more widely across computational science and engineering), defining an actionable research roadmap for increased accountability, rethinking how we evaluate research outputs (particularly software) and document research processes and associated e-infrastructure, as well as how best to change culture and behaviour — and perhaps more importantly, incentivisation structures — for researchers, institutions and governments:

The computer systems research (CSR) community has developed numerous artifacts that encompass a rich and diverse collection of compilers, simulators, analyzers, benchmarks, data sets and other software and data. These artifacts are used to implement research innovations, evaluate trade-offs and analyze implications. Unfortunately, the evaluation methods used for computing systems innovation can be at odds with sound science and engineering practice. In particular, ever-increasing competitiveness and expediency to publish more results poses an impediment to accountability, which is key to the scientific and engineering process. Experimental results are not typically distributed with enough information for repeatability and/or reproducibility to enable comparisons and building on the innovation. Efforts in programming languages/compilers and software engineering, computer architecture, and high-performance computing are underway to address this challenge.


This Dagstuhl Perspectives Workshop brings together leaders of these efforts and senior stakeholders of CSR sub-communities to determine synergies and to identify the promising directions and mechanisms to move the broader community toward accountability. The workshop assesses current efforts, shares what does and doesn’t work, identifies additional processes, incentives and mechanisms, and determines how to coordinate and sustain the efforts. The workshop’s outcome is a roadmap of actionable strategies and steps to improving accountability, leveraging investment of multiple groups, educating the community on accountability, and sharing artifacts and experiments.

 
Organised by Bruce R. Childers (University of Pittsburgh, USA), Grigori Fursin (cTuning, France), Shriram Krishnamurthi (Brown University, USA) and Andreas Zeller (Universität des Saarlandes, Germany), Dagstuhl Perspectives Workshop 15452 takes place from 1-4 November 2015 (see the full list of invited attendees); looking forward to reporting back in November.

Tagged , , , ,

Paper submitted to CAV 2015: “Dear CAV, We Need to Talk About Reproducibility”

Today, me, Ben Hall (Cambridge) and Samin Ishtiaq (Microsoft Research) submitted a paper to CAV 2015, the 27th International Conference on Computer Aided Verification, to be held in San Francisco in July. CAV is dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems; the conference covers the spectrum from theoretical results to concrete applications, with an emphasis on practical verification tools and the algorithms and techniques that are needed for their implementation.

In this paper we build upon our recent work, highlighting a number of key issues relating to reproducibility and how they impact on the CAV (and wider computer science) research community, proposing a new model and workflow to encourage, enable and enforce reproducibility in future instances of CAV. We applaud the CAV Artifact Evaluation process, but we need to do more. You can download our arXiv pre-print; the abstract is as follows:

How many times have you tried to re-implement a past CAV tool paper, and failed?

Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper we propose an infrastructure for enabling reproducibility in our community, by automating the build, unit testing and benchmarking of research software.

 
(also see: GitHub repo)

Tagged , , , , ,