The Linked TEI: Text Encoding in the Web

TEI Conference and Members Meeting 2013: October 2-5, Rome (Italy)


As in previous years, the meeting will be accompanied by pre-conference workshops on TEI and DH related topics. Workshops are free but please make sure to register for participation in tutorials as early as possible as places are limited. Registration to pre-conference Workshops and Tutorials is on the same page of conference registration: so please if you are interested in one or more of this events please register for them as well.

Accademic programme workshops

Perspectives on querying TEI-annotated data

Workshop date: Tuesday, 1 October 2013, 9-17

Workshop coordinators: Banski, Piotr; Kupietz, Marc; Witt, Andreas – More info

The TEI provides mechanisms to richly annotate a variety of digital resources used in the Humanities. The typical way in which many Humanities scholars use annotations is as instructions for processing them for the purpose of visualisation or transformation into other formats. However, a major aim of TEI annotation is to enrich the data with the results of scholarly effort. It is therefore essential to be able to efficiently retrieve the various pieces of information in a structured way. This, in turn, requires accessible and user-friendly — but at the same time reasonably powerful — query languages.

Naturally, XQuery or XSLT provide access to all the information expressed in annotations. However, it should be borne in mind that, despite the warm feeling of power that good command of XQuery or XSLT offers to the researcher, not everyone is able to exploit their full capacity. Learning either of these Turing-complete programming languages requires an amount of time and devotion that not every scholar or student is able to allocate for this purpose. Like in the case of natural languages, one benefits greatly from long-time exposure and repetition – but these are conditions that characterise the tasks that face programmers or IT personnel rather than most literary scholars or students, who may greatly benefit from more specialized query languages which are at least one level of abstraction above XSLT or XQuery, and which offer user-friendliness instead of ultimate power and versatility.

The world of Digital Humanities – arguably the central focus of the TEI – has long ago expanded beyond simple working with electronic text in the word processor of the day. DH specialists gather, curate, and query various sorts of textual data, from plain text via semi-structured XML to records in relational databases. The nature of the objects of research varies as well: they come, among others, as single texts with sometimes very complex internal structure, bundles of base documents with hierarchies of annotations and all kinds of interrelationships among them, parallel multilingual data (e.g. original works and their translations) or scattered prosopographic fragments. Much of that can nowadays be wrapped in a TEI envelope.

Given the above issues, it is natural to wonder whether the strategy typically advocated in the work of the TEI Council and often voiced on TEI-L – to stress that the TEI should best be handled by general-purpose XML-oriented tools (to which XQuery and XSLT belong) – should carry over to the task of retrieval from richly annotated data, especially if said retrieval is to be made available to an average scholar or student. Or, more precisely, whether it would be better to offer scholars and students a language tied more tightly to the TEI data model and whether it is possible for such a query language to address the entire TEI universum of objects in a uniform manner.

Within the last decade, a lot of effort to create efficient and user-friendly query systems has been undertaken within corpus linguistics, but the knowledge about them spreads very slowly outside this field. On the other hand, corpus linguists are often not aware of specific issues and needs of querying digital texts used outside linguistics.

Therefore, the workshop aims at building a common ground for the sharing of experiences among researchers dealing with various aspects and forms of TEI-annotated digital text. The presentations will address the impact of experiences of querying richly annotated linguistic corpora on other fields within Digital Humanities and discuss specific TEI-related problems when dealing with queries.

The invited contributions as well as the panel discussion are expected to address, among others, the following range of issues:

  • query languages and query environments;
  • queries dealing with a variety of text objects in a variety of TEI-annotated structures;
  • enhancement of user-friendliness by, e.g., hiding the potential complexity under a simple set of agreed symbols or by the use of a graphical user interface;
  • a common query language to extend over the range of objects defined by the TEI data model.

This workshop is meant to bring together, on the one hand, corpus linguists and computer scientists, who will present their suggestions of reflections on the possibility of creating a Corpus Query Lingua Franca for Humanists, and, on the other, TEI practitioners themselves, presenting both concrete tasks that combine textual and non-textual data in a novel manner, as well as theoretical challenges that a modern query system for Digital Humanists should tackle.

List of presentations

Workshome homepage is to be found at http://corpora.ids-mannheim.de/queryTEI.html

CLARIN, Standards and the TEI

Workshop date: Monday, 30 September 2013, 10-17

Workshop coordinator: Martin Wynne – More info

CLARIN is a pan-European initiative which aims to build a research infrastructure for language resources which will integrate numerous tools and resources in a distributed architecture, and which will respond to the needs of researchers across the humanities and social sciences. CLARIN is being built on open standards, but also with a recognition that standards and guidelines are only one part of a complex jigsaw which needs to be assembled to create reliable, durable and high quality services.

A keynote speech will be given by Alexander Geyken of the Berlin-Brandenburg Academy of Sciences (BBAW) on the topic of the use of TEI in the development of the Deutsches Textarchiv.

There will be a number of presentations on topics on the appliation of the TEI guidelines to language resources and tools, and about the role of the TEI in emerging CLARIN services and standards. Presenters will not simply present an overview of their work, but focus on precisely how, why (or why not) TEI formats, guidelines and technologies are being deployed, and to go into some technical detail on these topics.

It is hoped that this will be only the start of promoting dialogue and collaboration between CLARIN and the TEI at many levels. One result would be an improved dialogue about the use of the TEI in higher-level initiatives to develop standards for the CLARIN architecture, but another would be enhanced engagement directly with the TEI community of developers and researchers in the many centres and institutions related to CLARIN.

This workshop is aimed at:

  • CLARIN developers
  • researchers in the humanities and social sciences already working
    text encoding and with CLARIN demonstrator projects
  • digital humanists interested in working towards integration of
    their resources with the CLARIN infrastructure
  • TEI members interested in developing guidelines for linguistic
    resources (e.g. the Linguistic SIG)

Workshops proposed by sponsors

Athena plus workshop: Innovative tools and pilots for access to digital cultural heritage in the framework of Europeana and national systems

Workshop date: Wednesday, 2 October 2013, 9-13

Workshop coordinator: Istituto centrale per il catalogo unico delle biblioteche italiane (ICCU) – More info

ICCU (www.iccu.sbn.it) is the Central Institute for the Union Catalogue of Italian Libraries, whose mission is to develop programmes, studies and scientific initiatives within cataloguing, creation of inventories and digitisation of the bibliographic and documentary heritage held in State libraries and other Italian public and private bodies.

AthenaPlus (http://www.athenaplus.eu) is a CIP-best practice network composed of 40 partners from 21 Member States countries. The main goals are to: contribute more than 3.6 millions metadata records to Europeana, focusing mainly on museums content, improve search, retrieval and re-use of Europeana’s content; experiment with enriched metadata its re-use adapted for users with different needs (tourists, schools, scholars) through a set of  tools that support the development of virtual exhibitions, tourist and didactic applications.

The aim of this workshop is to illustrate in five presentations some activities carried out by several European and national projects feeding Europeana.

  • How to manage multilingual terminologies for a better access to the network: The Terminology Management Platform (TMP) developed in the framework of Athena, Linked Heritage, and AthenaPlus

  • Multilingual terminologies: the experiences of EuropeanaCollection 1914-1918, PartagePlus and other European projects

  • MOVIO: a tool for building online virtual exhibitions in an innovative way, using the  Semantic content management system (ontology builder)

  • National aggregators and Linked Open Data: the case of CulturaItalia

  • Experiments of Linked Open Data in the framework of the National Union Catalogue (SBN).

Automatic semantic annotation

Workshop date: Tuesday, 1 October 2013, 9-17

Workshop coordinator: CINECA

Automatic semantic annotation, integration of different sources (Linked Data), organisation of the information in formal structure and deploying of semantic services are key elements to deliver services where users efficiently access the information.

The workshop will address these topics:

  • Semantic annotation of digital text based on natural language processing (NLP)

  • Storing of meta-information in RDF format in a knowledge base

  • Linked Data integration

  • Web Services to search, navigate and analyze the semantic content

CINECA will present his tools for semantic annotation and for navigation of semantic content.

The workshop will be open to everyone with an interest in the topics being addressed. We will specifically welcome representatives for relevant projects to make presentations and take part in the discussion.


  • 9.00 Concept Mapper: a tool for semantic annotation – Roberta Turra
  • 9.45 Web Services to search, navigate and analyze the semantic content – Alessandro Paderno
  • 10.30 Named Entity Recognition from digital text – Giorgio Pedrazzi
  • 11.15 Main topic identification through correspondence analysis – Marco Scarnò
  • 12.00 Open discussion