As in previous years, the meeting will be accompanied by pre-conference workshops on TEI and DH related topics. Workshops are free but please make sure to register for participation in tutorials as early as possible as places are limited. Registration to pre-conference Workshops and Tutorials is on the same page of conference registration: so please if you are interested in one or more of this events please register for them as well.
Accademic programme workshops
Perspectives on querying TEI-annotated data
Workshop date: Tuesday, 1 October 2013, 9-17
Workshop coordinators: Banski, Piotr; Kupietz, Marc; Witt, Andreas – More info
The TEI provides mechanisms to richly annotate a variety of digital resources used in the Humanities. The typical way in which many Humanities scholars use annotations is as instructions for processing them for the purpose of visualisation or transformation into other formats. However, a major aim of TEI annotation is to enrich the data with the results of scholarly effort. It is therefore essential to be able to efficiently retrieve the various pieces of information in a structured way. This, in turn, requires accessible and user-friendly — but at the same time reasonably powerful — query languages.
Naturally, XQuery or XSLT provide access to all the information expressed in annotations. However, it should be borne in mind that, despite the warm feeling of power that good command of XQuery or XSLT offers to the researcher, not everyone is able to exploit their full capacity. Learning either of these Turing-complete programming languages requires an amount of time and devotion that not every scholar or student is able to allocate for this purpose. Like in the case of natural languages, one benefits greatly from long-time exposure and repetition – but these are conditions that characterise the tasks that face programmers or IT personnel rather than most literary scholars or students, who may greatly benefit from more specialized query languages which are at least one level of abstraction above XSLT or XQuery, and which offer user-friendliness instead of ultimate power and versatility.
The world of Digital Humanities – arguably the central focus of the TEI – has long ago expanded beyond simple working with electronic text in the word processor of the day. DH specialists gather, curate, and query various sorts of textual data, from plain text via semi-structured XML to records in relational databases. The nature of the objects of research varies as well: they come, among others, as single texts with sometimes very complex internal structure, bundles of base documents with hierarchies of annotations and all kinds of interrelationships among them, parallel multilingual data (e.g. original works and their translations) or scattered prosopographic fragments. Much of that can nowadays be wrapped in a TEI envelope.
Given the above issues, it is natural to wonder whether the strategy typically advocated in the work of the TEI Council and often voiced on TEI-L – to stress that the TEI should best be handled by general-purpose XML-oriented tools (to which XQuery and XSLT belong) – should carry over to the task of retrieval from richly annotated data, especially if said retrieval is to be made available to an average scholar or student. Or, more precisely, whether it would be better to offer scholars and students a language tied more tightly to the TEI data model and whether it is possible for such a query language to address the entire TEI universum of objects in a uniform manner.
Within the last decade, a lot of effort to create efficient and user-friendly query systems has been undertaken within corpus linguistics, but the knowledge about them spreads very slowly outside this field. On the other hand, corpus linguists are often not aware of specific issues and needs of querying digital texts used outside linguistics.
Therefore, the workshop aims at building a common ground for the sharing of experiences among researchers dealing with various aspects and forms of TEI-annotated digital text. The presentations will address the impact of experiences of querying richly annotated linguistic corpora on other fields within Digital Humanities and discuss specific TEI-related problems when dealing with queries.
The invited contributions as well as the panel discussion are expected to address, among others, the following range of issues:
- query languages and query environments;
- queries dealing with a variety of text objects in a variety of TEI-annotated structures;
- enhancement of user-friendliness by, e.g., hiding the potential complexity under a simple set of agreed symbols or by the use of a graphical user interface;
- a common query language to extend over the range of objects defined by the TEI data model.
This workshop is meant to bring together, on the one hand, corpus linguists and computer scientists, who will present their suggestions of reflections on the possibility of creating a Corpus Query Lingua Franca for Humanists, and, on the other, TEI practitioners themselves, presenting both concrete tasks that combine textual and non-textual data in a novel manner, as well as theoretical challenges that a modern query system for Digital Humanists should tackle.
List of presentations
- Peter Bouda (Centro Interdisciplinar de Documentação Linguística e Social) “Querying GrAF data in linguistic analysis“
- Øyvind Eide, Vemund Olstad (Unit for Digital Documentation, University of Oslo) “TEI for Interactive Concordances: The New Menota Search System“
- Serge Heiden (ICAR Research Lab – Lyon University and CNRS, France) “Exploiting TEI-annotated data with TXM“
- Thomas Krause, Carolin Odebrecht, Amir Zeldes, Florian Zipser (Humboldt-Universität zu Berlin) “Unary TEI Elements and the Token Based Corpus“
- Piotr Pęzik (University of Lodz) “Indexed graph databases for querying rich TEI annotation“
- Laurent Romary (INRIA & HUB-IDSL) “Data models and the (blind ?) query of lexical resources“
- Dirk Roorda (DANS) “System for HEBrew Text: ANnotations for Queries and Markup“
- Thomas Schmidt (IDS Mannheim) “Querying Spoken Language Corpora“
Workshome homepage is to be found at http://corpora.ids-mannheim.de/queryTEI.html
Workshops proposed by sponsors
Athena plus workshop: Innovative tools and pilots for access to digital cultural heritage in the framework of Europeana and national systems
Workshop date: Wednesday, 2 October 2013, 9-13
ICCU (www.iccu.sbn.it) is the Central Institute for the Union Catalogue of Italian Libraries, whose mission is to develop programmes, studies and scientific initiatives within cataloguing, creation of inventories and digitisation of the bibliographic and documentary heritage held in State libraries and other Italian public and private bodies.
AthenaPlus (http://www.athenaplus.eu) is a CIP-best practice network composed of 40 partners from 21 Member States countries. The main goals are to: contribute more than 3.6 millions metadata records to Europeana, focusing mainly on museums content, improve search, retrieval and re-use of Europeana’s content; experiment with enriched metadata its re-use adapted for users with different needs (tourists, schools, scholars) through a set of tools that support the development of virtual exhibitions, tourist and didactic applications.
The aim of this workshop is to illustrate in five presentations some activities carried out by several European and national projects feeding Europeana.
How to manage multilingual terminologies for a better access to the network: The Terminology Management Platform (TMP) developed in the framework of Athena, Linked Heritage, and AthenaPlus
Multilingual terminologies: the experiences of EuropeanaCollection 1914-1918, PartagePlus and other European projects
MOVIO: a tool for building online virtual exhibitions in an innovative way, using the Semantic content management system (ontology builder)
National aggregators and Linked Open Data: the case of CulturaItalia
- Experiments of Linked Open Data in the framework of the National Union Catalogue (SBN).