Computational Linguistics & Text Mining Lab

The Computational Linguistics and Text Mining Lab (CLTL) with Prof.Dr. Piek Vossen as director is part of the Department of Language, Literature and Communication of the Faculty of Humanities of the Vrije Universiteit Amsterdam, and of the Network Institute.


Our research

The Computational Linguistics and Text Mining Lab (CLTL) models the understanding of natural language by machines. Machines that can read texts and understand what it is about (what, who, when, where), but also machines that create powerful distributional language models from large volume of text using Deep Learning techniques. Our research tries to obtain a better understanding of so-called backbone models, reveal biases and unwanted errors but also to combine distributional approaches with explicit symbolic models to add explanatory power. Please go here for an overview of our current research and links to more information.

We see language as a reference system that connects people and systems to their perception of the world. Identity, reference and perspectives are central themes in our research and are studied in combination. You can read more about the Theory of Identify, Reference and Perspective (TIRP) here. In our research projects on Communicative Robots, many of our ideas come together: http://makerobotstalk.nl In these projects, we try to build robots that communicate with people in real-world situations taking perceptions of the contexts into account and the shared common ground.

What we teach you as a student

CLTL trains students to prepare them for academic careers in Computational Linguistics but also for the industry as Linguistics Engineers for Text Mining. Students at CLTL learn all the technical skills needed but specifically how to combine these with their knowledge and passion for language. If you are interested to study with us and become ready for the job market with your language skills, check out our teaching programmes: research and text mining.

Many of our students started with a background in linguistics only but within a year they learned to use their skills and knowledge to analyse language as data. Follow this link for examples of our student projects to see what they did and what you can learn to do as well.

Our application perspective is Text Mining: technology that is used to automatically extract knowledge and information from text and to turn unstructured data in structured data that can be used by organisations. This ranges from simple statements and facts, to events, storylines, to opinions and world-views but also fake news, toxic language detection, and the analysis of online debates. The world needs specialists in Text Mining that understand the complexity and richness of text. Text Mining is more than Data Mining as language is ambiguous, abstract without context and it is dynamic and exhibits a lot of variation.

Our products

CLTL has created a wide range of Natural Language Processing software modules and resources. All our software is open source and can be accessed through our Github repository. An overview of our language resources: annotated corpora and lexicons, can be found here.

Within our main research goal, we study and model lexicons and knowledge-graphs in all facets. We consider the lexicon as a neural-cognitive product including epistemic and symbolic features. From this perspective, our research question is “how these neural-cognitive systems develop and function when we learn and use language?”. Furthermore, the lexicon is an interface between language as a system and our knowledge of the world. We consider lexicons as a module for many computer applications that are knowledge-based. We are especially interested in the role of the lexicon as a bridge between form and meaning.  CLTL developed the Open Dutch Wordnet and Framenet as important lexical resources for Dutch.

Our research not only targets Dutch and English as a language but also other languages in the world. An important project is the Global Wordnet Association and the building of the Global Wordnet Grid: this project aims at representing many vocabularies of languages as semantic networks or wordnets and combining them through a universal index of meaning. Building and studying this grid will tell us more about universalities and idiosyncrasies of languages and likewise about the roles and functions of words and expressions.

Video explaining  5 years of research in the Spinoza-projects “Understanding Language by Machines“, 2019
Video explaining NewsReader‘s Reading Machine — Example of one of CLTL’s projects
Lecture (in Dutch) Prof. dr. Piek Vossen at Paradiso Amsterdam: ‘To Communicate with an Imperfect Robot — Get It?’ March 25, 2018