Thinking Creative Industries

Transforming Scholarship In The Archives

In this article

Transkribus, a spinout formed by a team including Professor of Digital Cultural Heritage Melissa Terras, is using AI-powered handwritten text recognition (HTR) to give researchers, institutions and the public unprecedented access to written records of global cultural importance.

Transkribus, a spinout formed by a team including Professor of Digital Cultural Heritage Melissa Terras, is using AI-powered handwritten text recognition (HTR) to give researchers, institutions and the public unprecedented access to written records of global cultural importance.

Libraries and archives around the world house a treasure trove of handwritten texts ranging from literary works and political essays to census results, medical files, and meteorological reports. These texts of historical and cultural importance are at risk of languishing on the shelves if they are not transcribed – a process that has typically been laborious, time-consuming and expensive. But bringing our written heritage into today’s digital world is exactly what Transkribus does, and with ease.

Origins

Transkribus originated out of an EU-funded Recognition and Enrichment of Archival Documents (READ) research project by a consortium of leading research groups from all over Europe, coordinated by Dr Günter Mühlberger of the University of Innsbruck, Austria. The convergence of several computational developments in combination with the availability of large datasets, i.e., scanned images, has resulted in vast improvements in the recognition of handwritten historical documents. Transkribus has seized upon, and contributed to, the opportunities afforded by these developments to create a unique platform and user community maintained and developed by READ-COOP SCE, which is chaired by Dr Mühlberger, Innsbruck colleague Dr Andy Stauder, who acts as CEO, and Melissa Terras, Professor of Digital Cultural Heritage at the University of Edinburgh.

Professor Melissa Terras

Community

Transkribus allows users to create “ground truth” data that is suitable for machine learning. From submitted images and transcripts, the HTR engines learn to decipher handwritten or printed text from digital images and can then automatically generate transcripts of similar material. Memory institutions, humanities scholars and the public provide digitised images and transcripts as ground truth for HTR training, whilst computer scientists deliver the necessary research and implementation work to sustain and develop this technology. Each and every contribution improves Transkribus, making it into a more accurate and powerful tool.

Access was a key consideration for Professor Terras, Dr Mühlberger and Dr Stauder, as they were keen to provide a platform that is affordable and open to all once innovation funding came to an end. They established Transkribus as a cooperative (READ-COOP SCE), which means that the company can not only keep the infrastructure operational, develop further tools and services, and provide a high standard of service to its users for a reasonable fee, but also give back to the community through discounts and by giving free assistance and support to students and Early Career researchers. Most importantly, Transkribus connects its community and facilitates data sharing, which forms the basis of the data sets that are crucial for AI training.

Where past and future meet

Today, Transkribus is a comprehensive platform for the digitisation, AI-powered text recognition, transcription and searching of historical documents – from any place, any time, and in any language. Smart search technology can find words in a collection, and even recognise and retrieve results for words where there are historical or personal variations in spelling. In 2020 Transkribus was named winner of the Horizon Impact Award, which honours EU-funded projects that have had a societal impact across Europe and beyond. As of January 2023, 43 million images of handwritten texts have been uploaded to the system for transcription. The platform now employs over 20 people, has over 100,000 users and over 130 members worldwide that joined the co-operative, including the British Library, the National Library of Scotland and many international libraries and archives, who together have transformed accessibility to our historical past.

Handwritten Text Recognition is a mature AI technology in the library and archive space, and our ongoing research is helping understand the benefits that AI can bring to the cultural heritage sector. I’m grateful to be part of this team effort, and hugely proud of what the cooperative has achieved together.”

– Melissa Terras, Professor of Digital Cultural Heritage and Scholarly Director of READ-COOP SCE, the provider of Transkribus

RELATED LINKS

60 years of Computer Science and AI research

Work with the School of Informatics

Transkribus

Header Image – Scisetti Alfio

Join us to challenge, create, and make change happen.

#ChallengeCreateChange