Allen Renear: Text encoding, ontologies, and the future
Wednesday, 2 October 2013
SGML/XML text encoding has played a important role in the development of the global networked information system that now dominates almost all aspects of our daily lives — commercial, scientific, political, social, cultural. The TEI community in particular has made impressive contributions. Today the information organization strategies that provide the foundation for contemporary information technologies are undergoing a new phase of intense and ambitious development. There has of course been a period of skepticism, just as there was with SGML in the 1980s. But that period is now behind us, or should be. Ontologies, “linked open data”, and semantic web languages like OWL and RDF have proven their value and are beginning to yield practical applications. These developments are not radical new strategies in information organization, rather they are continuation of a long-standing trajectory towards increased abstraction, declarative formalization, and standardization — strategies with a solid track record of success. In the last thirty years the text encoding community has helped sustain and advance the evolution of these information organization strategies, and is now well-positioned to further contribute to, and exploit, recent developments.
I will discuss the significance of all this not only for libraries, publishing, data curation, and the digital humanities, but also for the global networked information system more generally. Without a doubt advances in formalization will continue to bring us many new advantages, and so there is much to look forward to. But at the same time the low-hanging fruit has been picked and the problems we will encounter in the next decade or two will prove quite challenging.
Allen Renear is professor and interim Dean at GSLIS (the Graduate School of Library and Information Science, University of Illinois, USA) where he teaches courses and leads research in information modeling, data curation, and digital publishing.
Prior to coming to GSLIS Allen Renear was the Director of the Brown University Scholarly Technology Group. He received an AB from Bowdoin College and an MA and PhD from Brown University.
Recently, Renear’s work has focused on fundamental issues in the curation of scientific datasets and conceptual models for data management and preservation. This includes topics such as levels of abstraction and encoding, identity, ontology, etc., as well as projects in several related areas:
- A Formal Framework for Data Concepts
- Ontologies to Support Strategic Reading
- Collection/Item Metadata Relationships
- Ontologies for Digital Objects
The projects are all affiliated with the GSLIS Center for Informatics Research in Science and Scholarship (CIRSS) and funded by the National Science Foundation (NSF), the Institute for Museum and Library Studies (IMLS), and the Mellon Foundation.
Recommended recent pubblications are:
- Strategic Reading, Ontologies, and the Future of Scientific Publishing. Allen H. Renear, Carole L. Palmer. Science. 325:5942 p. 828 (2009).
- When Digital Objects Change — Exactly What Changes?. Allen H. Renear, David Dubin, Karen M. Wickett. Proceedings of the American Society for Information Science and Technology. 45:1 (2008).
Further works selected by Allen Renear for the interested can be found here, ordered in three categories:
- Ontology of Scientific and Cultural Objects
- Metadata and Logic
- Semantic Approaches to Digital Publishing
Saturday, 5 October 2013
The BVH (Bibliothèques Virtuelles Humanistes) team of the CESR-University of Tours started using the TEI encoding scheme to annotate and publish French Renaissance texts in 2006. In 2011, the “Corpus” research network was set up with an aim to developing the field of digital humanities and the European research infrastructures roadmap. Supported by the French Ministry of Research, this network comprises several national consortia, amongst which “CAHIER” (Corpus d’Auteurs pour les Humanités: Informatisation, Edition, Recherche – Authors’ Corpora for the Humanities: Digitisation, Edition, Research), coordinated by the CESR. The corpora assembled by this consortium (over 25) mainly centre on literary figures, but also concern the work of philosophers and the history of science: Polish philosophers, d’Alembert, Machiavel, Montaigne, Flaubert, Montesquieu, etc. CAHIER regularly organises training sessions and workshops aimed at developing linguistic, thematic and philological approaches to online editing using TEI guidelines. It is also involved in a number of workshops such as management projects and tutorials on specific tools (CMS, OAI-PMH, TXM, PhiloLogic). Collaboration with the two linguistic consortia (oral and written corpora) of the “Corpus” research network is already underway: a jointly organised advanced TEI workshop was programmed and a joint reflection on the definition(s) of “corpora” engaged. A special interest group, bringing together linguists, medievalists and anthropologists, is working on a series of recommendations concerning copyright. Thanks to long term collaborations with numerous libraries (French National Library, university and public libraries, Europeana Libraries consortium), set up well before the creation of the “Corpus” network, we are able to benefit from their staff’s expertise in metadata, records, iconographic thesauri and bibliographical databases. The aim of the CAHIER consortium is not just to provide online facsimiles of the assembled corpora, but to offer full-text documents, searchable, retrievable and shareable in XML and standard formats.
Online editing of collections of fragments is a preoccupation shared by many a scholar. A special interest group dedicated to correspondences is currently being set up which will be able to use TEI schemas already being exploited by several projects. What I particularly wish to bring attention to though is the potential added-value of the TEI guidelines in the domain of corpora publication as a means of furthering typological and taxonomic approaches to text processing.
Many TEI guideline users (and non-users) regret the lack of search tools and browsers capable of obtaining relevant results through a “genre tree”. At present, libraries all have their own thesauri, generally unknown to scholars; no initiatives seem to exist with an aim to developing folksonomies for texts, nor does there appear to be any coordination between libraries and booksellers, who have their own way of classifying their products. The situation is not however stalemate: one relevant starting point for moving forward would, for example, be to combine the dichotomy between fiction and non-fiction with the formal schemas found in the TEI guidelines: prose, verse and drama. A new schema would not be needed; only a well organised thesaurus for genres and sub-genres embedded using TEI and RDF; this could be searched through the headers using a faceted browser and adapted to a wide range of languages. Two objections obviously spring to mind. The first is the difference between national traditions: providing a multilingual tool would in itself necessitate a major research project. The second is the difficulty scholars have in agreeing on the definition of genres and their ontologies. I fully acknowledge that the development of an interdisciplinary and interlinguistic thesaurus presents a considerable challenge, but seems to me that it is well worth rising to.
Marie-Luce Demonet is professor of French Renaissance literature and director of the Maison des Sciences de l’Homme Val de Loire (The Loire Valley Institute for Social Sciences and Humanities).
Specialist of the relationship between literature and language, Marie-Luce Demonet has written works on relevant French authors and humanists such as Rabelais, Montaigne and Pasquier (critical and electronic editions, conference proceedings, monographs), and on the issues of literary theory (novel, fiction) and semiotics. Mrs. Demonet is the creator of two websites which host original texts of Renaissance (e.g. http://www.bvh.univ-tours.fr/Epistemon) and head of the project “Bibliothèques virtuelles des humanistes” (Humanists’ Virtual Libraries). Furthermore, she has published several articles concerning the application of the new technologies to the French Renaissance literature and taken part, since 1990, in various events about the same topic.
Her main areas of research include:
- History of Linguistic Theories
- Literary Genres
- Electronic Editions
- Philosophy of Language and Literature
Recommended publications are:
- Michel de Montaigne, Les Essais. Marie-Luce Demonet. Presses Universitaires de France (2002).
- Montaigne et la Question de l’Homme. Marie-Luce Demonet. Presses Universitaires de France (1999).
- Les Voix du Signe: Nature et Origine du Langage à la Renaissance, 1480-1580. Marie-Luce Demonet. H. Champion (1992).
- Les Grands Jours de Rabelais en Poitou: Actes du colloque international de Poitiers des 30 août et 1er septembre 2001. Marie-Luce Demonet. Droz (2006).