Panel session on curating and archiving linked data datasets from the humanities at DH2019

This week, as part of a group of international colleagues, we will be presenting our related work using linked data in the humanities as part of the invited panel at DH2019, the 2019 Digital Humanities Conference in Utrecht. This builds on recent work led by my colleague Christophe Guéret, who previously worked on linked data platforms at the BBC, but is now based at Accenture Labs in Ireland.

This work is being presented as part of panel session entitled: “Curating and Archiving Linked Data Datasets from the Humanities — From Data of the Present to Data of the Future”, which focuses on semantic web technologies and the adoption of linked (open) data principles in humanities research, and more particular, data curation practices around them. Digital Humanities are inseparable from digital (digitised) collections. In research infrastructures, be it on national or international level, we see networks between established and new repositories, digital libraries, archives and other content-providers and research institutions. This panel addresses new challenges for the alliance of research communities and service institutions which emerge from new data formats, vocabulary standardisation efforts, and collaborative practices. The papers in this panel address the following three research questions:

  • Q1: How to take care of linked data curation and management during research, and how to organise effective collaboration between semantic web technology pioneers and pioneers in Digital Humanities application of this technology?
  • Q2: How to exploit the possibilities for efficiency and synergy of linked (open) data in distributed networks between research, collections and overarching service providers?
  • Q3: How to best bridge conflicting priorities between enabling research (building, enriching, analysing) based on linked (open) data technology and the long-term preservation of Linked Data datasets (data of the present and data of the future)?

As you can see from our abstract below, we focus on indexing cultural heritage resources for research and education, based on some of the outcomes of the BBC’s Research and Education Space (RES) initiative:

Indexing Cultural Heritage Resources for Research and Education

Christophe Guéret and Tom Crick
 
In an increasingly digital world, e-infrastructure has become a key component of the daily life of researchers, underpinning a variety of academic activities. Over the period 2014-2020, the development of this varied e-infrastructure is supported by €850M of funding from the European Commission. Those systems at the core of data-driven science are expected to not only contribute to scientific discovery, but to also play a larger role in society. This paper reports on the results of a specific national project, which directly engaged with cultural heritage content in the UK based on Linked Data principles. At its core is an open platform which indexes and organises the digital collections of libraries, museums, broadcasters and galleries to make their collections and content more discoverable, accessible and usable to those in UK education and research. This includes culturally important images, TV and radio programmes, documents and text from world-renowned organisations such as the British Museum, British Library, National Archives, Europeana, Wellcome Trust and the BBC.
 
In addition to linking content, the project also supported developers to create digital educational products that would inspire learners, teaching practitioners and researchers by using applications powered by the Research and Education Space (RES) platform. This paper discusses the challenges faced by the project, the architecture developed for tackling them, and the lessons learned. In particular, we address challenges for consuming Web data; the problem of co-referencing (how to deal with the fact that several URIs can be created to refer to the same thing); and most prominently the problem of licensing. In particular, we discuss how the lack of unambiguous declarations of copyright and license as metadata hinders the re-use of existing published data, and which methods have been tested so far to circumvent this problem. The paper closes with an inspection of other existing collections and platforms and a discussion on how they solve the above listed problems.

The other papers in this panel session are:

  • When to store what and how? Data curation challenges during research projects (Marieke van Erp)
  • Automated Vocabulary Suggestion, Domain-Specific Linked Entity Recognition and Visually Faceted Verification (Vyacheslav Tykhonov)
  • 404 Error — Resource Not Found: Why we Need to Rescue Endangered Knowledge Organization Systems (Gerard Coen, Richard P. Smiraglia, Peter Doorn)
  • Archiving Linked Data Datasets — Experiences from a Humanities Research Infrastructure (Henk van den Berg, Jerry de Vries, Andrea Scharnhorst)
  • Dutch Historical Censuses — Preserving Data for Research, the Wider Public and Future Generations (Andrea Scharnhorst, Albert Meroño-Peñuela, Christophe Gueret, Ashkan Ashkpour)

(also see: Publications)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.