Video release: Meet NewsReader’s Reading Machine

A Reading Machine in 4 languages

Meet NewsReader’s Reading Machine! — Video explaining NewsReader’s Reading Machine

The volume of news data is enormous and expanding, covering billions of archived documents with millions of documents added daily. These documents are also getting more and more interconnected with knowledge from other sources such as biographies and company databases. NewsReader built a system that extracts what happened to whom, when and where from these sources and stores them in a structured database, enabling more precise search over this immense stack of information. Currently, our system supports English, Spanish, Italian and Dutch. Pilot projects are underway with government and financial information specialists, but the system can be useful to anyone looking to make sense of large amounts of news text.

NewsReader in a nutshell

NewsReader in a nutshell — From Newspapers to Knowledge, Visualised
NewsReader_in_a_NutshellThe project is described in this brochure (PDF).


LREC2016 Accepted Papers

CLTL has 11 accepted papers at LREC2016. We’ll see you in Portorož in May!


Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job ”  by Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis

Context-enhanced Adaptive Entity Linking” by Giuseppe Rizzo, Filip Ilievski, Marieke van Erp, Julien Plu and Raphael Troncy

MEANTIME, the NewsReader Multilingual Event and Time Corpus” by Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Begoña Altuna, Marieke van Erp, Anneleen Schoen and Chantal van Son

Crowdsourcing Salient Information from News and Tweets” by Oana Inel, Tommaso Caselli and Lora Aroyo

Temporal Information Annotation: Crowd vs. Experts” by Tommaso Caselli, Rachele Sprugnoli and Oana Inel

Addressing the MFS bias in WSD systems” by Marten Postma, Ruben Izquierdo, Eneko Agirre, German Rigau and Piek Vossen


A multi-layered annotation scheme for perspectives” by Chantal van Son, Tommaso Caselli, Antske Fokkens, Isa Maks, Roser Morante, Lora Aroyo and Piek Vossen

The VU Sound Corpus: Adding more fine-grained annotations to the Freesound database” by Emiel van Miltenburg, Benjamin Timmermans and Lora Aroyo

NLP and public engagement: The case of the Italian School Reform” by Tommaso Caselli, Giovanni Moretti, Rachele Sprugnoli, Sara Tonelli, Damien Lanfrey and Donatella Solda Kutzman

Two architectures for parallel processing for huge amounts of text” by Mathijs Kattenberg, Zuhaitz Beloki, Aitor Soroa, Xabier Artola, Antske Fokkens, Paul Huygen and Kees Verstoep

“The Event and Implied Situation Ontology: Application and Evaluation” by Roxane Segers, Marco Rospocher, Piek Vossen, Egoitz Laparra, German Rigau, Anne-Lyse Minard

CLIN 26 Organised by CLTL

Computational Linguistics in The Netherlands, CLIN26. VU University of AmsterdamCLIN26, Computational Linguistics in The Netherlands, Amsterdam, December 18 2015

The 26th Meeting of Computational Linguistics in the Netherlands (CLIN26) was organized by the CLTL group of the VU University of Amsterdam, and took place at the Hotel Casa 400 in Amsterdam on December 18, 2015.

Presentations CLIN26
Pictures of CLIN26 on Facebook CLTLVU

CLIN26 invited speaker Miriam Butt
Invited speaker Miriam Butt, Professor for General and Computational Linguistics at the University of Konstanz.

STIL Thesis Prize awarded to Nikos Voskarides
STIL Thesis Prize awarded to Nikos Voskarides

Organising Committee:
Antske Fokkens
Ruben Izquierdo
Roser Morante
Marten Postma
Piek Vossen

Master’s Evening, Dec. 01 2015

Master’s Evening 1 december 2015

On Tuesday 1 December 2015 you can visit our Master’s Evening where you can get informed about most of our Master’s degree programmes during information sessions. Please register and choose which of these sessions you would like to attend.

Date: Tuesday 1 December 2015
Time: 17:00 – 20:30
For whom: Higher education students and professionals
Location: Main building VU Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam

Please find information on our Research Master Specialization ‘Linguistic Engineering’

If you are not able to attend our Master’s Evening on 1 December 2015, you can visit the Master’s Day on Saturday 12 March 2016 or find out more about VU Amsterdam and our study programmes:
• Find your international Master’s degree programme and contact the coordinator for questions
Meet VU Amsterdam representatives in your own country
Visit our international students Facebook

Press release VU University on NewsReader: Join the hackathon!

VU-hoogleraar en Spinozawinnaar Piek Vossen presenteert NewsReader
Ontdek ook zelf deze nieuwe technologie die het nieuws leest

In 2013 startte Piek Vossen (hoogleraar computationele lexicologie), samen met onderzoekers in Trento en San Sebastian en met de bedrijven LexisNexis (NL), SynerScope (NL) en het Engelse ScraperWiki het NewsReader project om een computerprogramma te ontwikkelen dat dagelijks het nieuws ‘leest’ en precies bijhoudt wat, wanneer, waar gebeurd is in de wereld en wie er bij betrokken is. Het project kreeg hiervoor 2,8 miljoen euro subsidie van de Europese Commissie.

SynerScope‘s visualization: extraction from 1.26M news articles

Nieuws lezen in vier talen
In afgelopen 3 jaar hebben de onderzoekers een technologie ontwikkeld om de computer automatisch het nieuws te laten lezen in vier talen. Uit miljoenen krantenartikelen is nu een doorzoekbare database gemaakt waarin duplicaten zijn ontdubbeld, complementerende informatie uit verschillende berichten op een slimme manier samengevoegd is en is de informatie verrijkt met fijnmazige types zodat je niet alleen op persoonsnamen zoals ‘Mark Rutte’ en `Diederik Samsom’ kunt zoeken, maar ook op entiteiten van het type ‘politicus’.

Presentatie NewsReader
Op dinsdagmiddag 24 november 2015 organiseert de onderzoeksgroep Computational Lexicology & Terminology Lab (CLTL) van Piek Vossen een workshop waarin de eindresultaten van het project gepresenteerd worden. Ook zijn er diverse sprekers die hun visie op het project geven, zoals VU-hoogleraar Frank van Harmelen (Knowledge Representation & Reasoning), Bernardo Magnini, onderzoeker bij FBK in Trento en Sybren Kooistra, data journalist bij de Volkskrant en medebedenker en hoofredacteur van Yournalism, het platform voor onderzoeksjournalistiek.

Doe mee met de Hackathon!
Op 25 november kunnen gebruikers zelf aan de slag met de nieuwsdatabase die is opgebouwd uit miljoenen krantenartikelen. Meer informatie en aanmelden.

NewsReader Workshop & Hackathon, Nov. 24—25 2015

Car Wars: Industrial Heroes Going Down Fighting

On 24 and 25 November 2015, we will showcase the NewsReader project and invite you to come explore our technology and its results yourself during our NewsReader Workshop and Hackathon.

Event Details
We have developed a powerful new tool called ‘NewsReader’ which utilises natural language understanding and semantic web technology. This helps you to better understand the interactions between companies and key individuals, derived from news articles.
Our dataset encompasses 12 years of news charting the struggle of automotive players to rule the global market, to satisfy the expectations of the shareholders, and their suffering from the financial crisis and new economies: industrial heroes going down!

The Workshop
Tuesday 24 November 2015, 14:00 – 18:00 Amsterdam Public Library
In this workshop, we will bring together start-ups, companies, researchers and developers to present and discuss the NewsReader project, the technological domains it draws from and future applications for these technologies.
This afternoon will feature invited talks, demos, a panel discussion and a networking reception.
Confirmed Speaker:
Prof. dr. Frank van Harmelen, Vrije Universiteit Amsterdam. Frank van Harmelen is a professor in Knowledge Representation & Reasoning in the AI department (Faculty of Science) at the Vrije Universiteit Amsterdam. After studying mathematics and computer science in Amsterdam, he moved to the Department of AI in Edinburgh, where he was awarded a PhD in 1989 for his research on meta-level reasoning.

The Hackathon
Wednesday 25 November 2015, 10:00 – 18:00 Amsterdam Public Library
In June 2014 and January 2015 we ran several hackathons in both London and Amsterdam in which NewsReader enabled the attendees to pull out networks of interactions between entrepreneurs, politicians, companies and thoroughly test drive our technology. This November, we’re releasing a new version of our processing pipeline and we’re scaling up to 10 million processed news articles from sources about the automotive industry to obtain a searchable database of the news. At the hackathon, you can play with this dataset and explore the processing pipeline.
The global automotive industry has a value in the order of $1 trillion annually. The industry comprises a massive network of suppliers, manufacturers, advertisers, marketeers and journalists. Each of these players has his/her own story, often with unexpected origins or endings; one day you may be CEO of a big car company, the next you are out and making pizzas. With NewsReader, you can uncover these stories to reconstruct the past.

This event may be of interest to you if:
You’re interested in natural language processing and/or semantic web technology
You’re a data journalist on an automotive desk;
You’re an analyst sifting daily news looking for information on your company or on competitors;
You’re a data analyst looking to understand how your customers operate their supply chain
You’re an analyst trying to find secondary events that could influence an investment decision;
You’re interested in visualising big data

Attendance is free, but please register by Sunday 22 November 17:00 CET. .

NewsReader helps you find a needle in a haystack.

VENI grant for Antske Fokkens

Antske Fokkens received a VENI grant for her proposal Reading between the lines. The project aims at identifying so-called implicit perspectives in text.

Perspectives are conveyed in many ways. Explicit opinions or highly subjective terms are easily identified. However, perspectives are also expressed more subtly. For instance, Nick Wing argues that media describe white suspects (e.g. brilliant, athletic) more positively than black victims (e.g. gang member, drug problems). Ivar Vermeulen (p.c.) observes in a small Dutch corpus that Moroccan perpetrators are easily called thieves (implying generic behavior), where other perpetrators from Dutch only stole something (implying incidental behavior). These observations are anecdotal, but reveal how choices concerning what information to include or how to describe someone’s role may display a specific perspective.

This project will investigate how linguistic analyses may be used to identify these more implicit ways of expressing perspectives in text. This research will be carried out in three stages: First, large scale corpus analyses will be applied to identify distributions of semantic roles (what entities do) and other properties assigned to them (their characteristics). In the second stage, generic participants will be linked to the semantic role they imply (e.g. a thief will be linked to the perpetrator of stealing). With these links, we can investigate whether thieves are described differently from people who steal. In the third stage, emotion and sentiment lexica will be used to identify the sentiment associated with descriptions of people enabling research that investigates whether people are depicted positively or negatively.

The research is carried out in the context of digital humanities and social sciences. Evaluation and experimental setup will be guided towards identifying differences in perspective between sources. In addition to correctness of linguistic analyses (intrinsic evaluation), the possibility of using the method for identifying changes in perspective over time (historic research) or differences in perspective between sources (communication science) will be investigated.

VU University scientists cluster responses NWO’s National Research Agenda

Led by Piek Vossen, a group of scientists at VU University automatically divided 11,700 questions from NWO’s National Research Agenda into clusters. On the basis of language technology and mathematical equations of the most important words, there were slightly over 60 clusters of questions found which at their turn were classified in a few hundred sub-clusters. Important themes are health and energy, but also big data, art, and sports. NWO is happy with this analysis. The VU Topic Browser allows NWO to quickly and efficiently process the large number of responses.

National Science Agenda VU Topic Browser — by Emiel van Miltenburg, Kasper Welbers, Hennie van der Vliet, Wouter van Atteveldt, Piek Vossen
National Science Agenda VU Topic BrowserNational Science Agenda VU Topic Browser The graph shows 60 clusters and a few hundred sub-groups found in 11,700 questions from NWO’s National Research Agenda.

Netherlands Organisation for Scientific Research (NWO): Dutch Science Agenda
Scientists determine the Dutch Science Agenda together with companies, civil society organisations and interested citizens. The agenda consolidates the themes that science will focus on in the coming years. What are the favourable opportunities for Dutch science and how can science contribute to finding solutions for societal challenges and making the most of economic opportunities?

Position AAA Data Science Postdoc or PhD Student in Computational Linguistics

We are hiring: Postdoctoral researcher/PhD candidate in Computational Linguistics

AAA Data Science Postdoc or PhD Student in Computational Linguistics

The Amsterdam Academic Alliance (AAA) is a joint initiative of the two Amsterdam-based universities – VU and the UvA – aimed at intensifying collaboration with each other and with other knowledge institutions in the region. The objective of the AAA is to cement Amsterdam’s position as a major international player and hub of academic excellence. The alliance is to result in different outcomes in each scientific field.

This advertisement concerns one of the 14 positions. The Network Institute of the VU University of Amsterdam is looking for a motivated Postdoctoral researcher or PhD student for the project “From text to Deep Data”. The candidate will be part of the Network Institute of the VU University Amsterdam and will work within multidisciplinary teams of humanities researchers and computer scientists.
The work will be done in the context of a larger research program called “QuPiD2: Quality and Perspectives in Deep Data” in collaboration with other researchers aiming all together to achieve a formal modeling of quality and perspectives.

As part of the QuPiD2 research team, the candidate will develop 1) a perspective model for representing the subjective relation between the source of information and the statements in it, and 2) software to detect such interpersonal communication layers and perspectives from text. The project will transform big unstructured text data into deep data that show the emotions, opinions and view points on the changing world. It will reveal the social networks and dynamics within trust networks that influence our world views.

1. Studying existing models for handling provenance, attribution, sentiments, opinions and emotions as expressed in text.
2. Developing an overarching perspective model for representing the subjective relations between sources and their statements. The model will initially be based on textual data but should show the capacity to model perspectives on any type of (big) data.
3. Using semantic web standards, e.g. RDF, SPARQL, to represent and access the data within the project
4. Studying existing NLP approaches to detect perspective relations in texts. Both English and Dutch texts will be considered.
5. Developing a machine-crowd empowered processing of textual sources for populating the QuPiD2 model
6. Creating data sets for training and evaluation through expert annotation and crowd annotation.
7. Developing new components and approaches to obtain the perspective values within the model from textual data.
8. Evaluate the components against the data sets developed and within an application environment.
9. Collaboration with the QuPiD2 program research team
10. Publish the results of the work as scientific articles in high ranked journals and conferences, as well as present the work at relevant scientific venues

The candidate should have a strong background (MA) in computational linguistics and semantic web technology with expertise in data modelling, modelling perspectives, subjectivity and attribution relations expressed in natural language. The candidate should have sufficient programming skills and experience with data engineering and text mining.

Further particulars
The appointment will be for a period of three years for a postdoc and four years for a PhD student. You can find information about our excellent employment conditions at such as:
• Remuneration of 8,3% end-of-year bonus and 8% holiday allowance;
• Solid pension scheme (ABP);
• A minimum of 29 holidays In case of full-time employment;
• Possibilities to save up holidays for sabbatical leave.

For a postdoc, the salary will be in accordance with university regulations for academic personnel, and depending on experience, range from a minimum of € 2,476 gross per month up to a maximum of € 3,908 gross per month (salary scale 10) based on a full-time employment.

For a PhD student, the salary will be in accordance with university regulations for academic personnel, range from a minimum of € 2,125 gross per month in the first year up to a maximum of € 2,717 (salary scale 85.0-3) based on full-time employment.

For additional information please contact:
Prof. Piek Vossen
phone: +31 681773878 or +31 20 59 86457
e-mail: / attention Piek Vossen

Dr. Lora Aroyo
Phone: +31 620329972

Applications may only be submitted via To process your application immediately, please quote the vacancy number and the title of the position you are applying for in the subject-line. Applications must include a detailed curriculum vitae, a motivation letter explaining why you are the right candidate, list of projects you have worked on with brief descriptions of your contributions and the names and contact addresses of two academic references from which information about the candidate can be obtained. All these should be grouped in one PDF attachment.

Applications will be accepted until 10 May 2015.

Any other correspondence in response to this advertisement will not be dealt with.