The Polemics Visualised project is a Network Institute Academy Assistants project, in this program scientists from different disciplines are brought together; every project combines methods & themes from informatics, social sciences and/or humanities. The Polemics Visualised project brings together researchers from Biblical Studies and Computational Linguistics.
The purpose of the Polemics Visualised pilot project is to explore the possibilities of computational linguistics and natural language processing for use in theological research of Classical Syriac texts. More specifically, we would like to answer the question whether Ephrem the Syrian, who wrote extensive polemics against Bardaisan, a theologian living two centuries earlier, was indeed discussing the same issues as Bardaisan addressed in his only remaining work.
Syriac, a language from the Aramaic family, has been the lingua franca of the Middle East for centuries. Many important theological documents from the period of the formation of the early church have been written in Syriac. These texts form a considerably large corpus, for example the published works of Ephrem the Syrian already exceed 500,000 words. The theological study of textual corpora of such size would benefit greatly from computational analysis of these texts.
So far, we have successfully trained a tokenizer using Apache OpenNLP and the annotated Syriac resources available at the Eep Talstra Center for Bible and Computer. With the resulting model we succeeded in recognizing 96% of word boundaries in the test data . We then used our tokenizer on the unannotated text of Ephrem, and with the resulting data and that of Bardaisan’s annotated work we trained an LDA topic model. The resulting topic model yields some sensible topic-document relations, but not sufficiently useful to aid in finding answers to questions such as the example mentioned above. We now aim to improve the results by morphological and part of speech tagging of the data, which would allow more efficient filtering of the input data for the topic analysis algorithm. Hannes Vlaardingerbroek, Marieke van Erp, Wido van Peursen (2015) Polemics Visualised: experiments in Syriac text comparison. Computational Linguistics in the Netherlands, Antwerp, February 2015.