Researchers in the digital humanities are investigating how computational methods can be applied to literary studies – enabling us to understand literature from a brand new angle.
In the field of digital humanities, researchers are delving into how literary studies can adapt to the digital age. Dr Jessica Witte, Edinburgh Futures Institute Postdoctoral Research Fellow in Text and Data Mining, believes that applying computational methods to literary studies will bridge the gap between STEM and the humanities.
The value of the digital humanities
According to Dr Witte, the digital humanities bring great value to literary studies. By applying digital methods to literary studies, researchers can benefit from the respective strengths of both STEM and the arts. In her own research, Dr Witte uses digital tools to explore various interdisciplinary topics in literature, such as medical ethics and feminist ethics in archival work. She primarily uses text mining methods, such as sentiment analysis and topic modelling.
“By offering a new way to engage with data, digitalisation makes the field more accessible and promotes diversity,” she states.
Dr Witte thinks it is important for humanities scholars to develop digital tools to analyse literature from a more humanities-oriented approach. Digital tools such as Natural Language Processing (NLP) and text mining models have not been initially designed for the types of texts commonly used in the humanities, such as historical and literary texts. As such, these text mining models have difficulty working with linguistic nuances, such as sarcasm and contextually-specific phrases.
“There are differences across the research methods of arts and technology,” explains Dr Witte. “While art uncovers abstract connections between language and power, technology favours numerical, binary information. To make digital tools such as NLP and AI models more useful for interdisciplinary research, they need to understand language from the humanities’ standpoint.”
Analysing online literary archives
Dr Witte is currently working on a project that involves analysing texts from the Internet Archive, an online database for literature and websites. In June 2022, the Internet Archive lost a copyright lawsuit, which interrogated the website’s status as a library, launched by four leading U.S. publishers. In light of this lawsuit, Dr Witte is investigating the definition, role, and place of libraries in the digital era. For instance, large-scale digital libraries (such as Google Books) and traditional libraries rely on institutional mass digitisation projects to build their collections. In comparison, Internet Archive is open to contributions from anyone who signs up with the website.
Dr Witte is also investigating whether the Internet Archive represents a wider range of works from different languages around the world, in comparison to Google Books and traditional libraries. Examining the Internet Archive’s metadata, Dr Witte finds that the answer is likely to be no. English texts are still overrepresented on the Internet Archive. Despite the presence of texts from over 400 languages on the Internet Archive, for about 10% of the represented languages, only a few works (mostly translations of the Bible) are available for the public to access and read.
Another aspect that Dr Witte is examining in her Internet Archive research project is the ethical implications of censorship on literary databases. In traditional libraries, safeguards are set in place to create buffers between readers and contentious texts. This practice follows an ethics of care that considers the wellbeing of both the readers and those who might be harmed by such texts. Dr Witte asks: How should online archives address, evaluate, or restrict readers’ access of potentially harmful texts? “Digitalisation of literature helps with representation but also poses a risk of disconnection,” says Dr Witte. She argues that we need to identify appropriate ethical practices for archiving, reading, and analysing digitised texts in the digital age.
Literature and medical ethics
Apart from online archives, Dr Witte is also working on a research project that delves into the connection between medical ethics, history, and literary studies. Her current book project examines the history of anorexia nervosa over the long nineteenth century through an analysis of representations of fasting women in literature and medicine, including a case study of the nineteenth-century “fasting girl” phenomenon. According to Dr Witte, historical cases of anorexia have shaped contemporary treatment and diagnosis of cases today by over-relying on BMI as a symptom. As a result, many people experiencing eating disorders are unable to receive medical care. In future work, she would like to explore how data-driven innovation in the medical humanities could provide avenues for addressing large-scale problems in healthcare, such as through revising our existing medical understanding of symptoms.
For instance, AI and NLP tools could be used to explore qualitative records about illness. Dr Witte points to an existing body of research in the medical humanities that shows how narratives written by people who are experiencing a certain condition or disease serve as a valuable source of data in the medical humanities field. For example, during the early days of the Covid-19 pandemic, first-person testimonials were a significant source of information about the disease. Digital tools developed to identify patterns and insights in patients’ lived experiences could therefore help medical researchers to better understand their patients, which could potentially contribute to improving future treatment.
Researcher profile
Dr Jessica Witte is a postdoctoral research fellow in Text & Data Mining at the University of Edinburgh affiliated with the Centre for Data, Culture and Society. Originally trained as a literary scholar, she often finds herself analysing the glitches, gaps, bugs, errors, and anomalies in datasets as meaningful points of inquiry. Her research interests include improving natural language processing methods for qualitative analysis along with rethinking data ethics for the digital era.