Category Archives: Natural language understanding

Research Masters meet Language Industry

MEET & GREET Human Language Technology (CLTL) & Language Industry

20171207_HLT_FooterThe Computational Lexicology and Terminology Lab (CLTL) organized a MEET & GREET between companies and master students on Friday December 08, 2017 13:30 – 18:00.

Research Masters Meet Language Industry
In the afternoon of Friday December 8th, 2017m students from the Humanities Research Master meet companies and organizations interested in students in Language Technology and other disciplines for internships and theses. The meeting is organized by the Computational Lexicology and Terminology Lab at the VU, in cooperation with the VU Humanities Graduate School.

CLTL is one of the world’s leading research institutes in Human Language technology. Prof. Dr. Piek Vossen, recipient of the NWO Spinoza Prize, heads the group of international researchers that are working on interdisciplinary projects, including the Spinoza project ‘Understanding Language by Machines’. At CLTL we are training the next generation language technology experts. The two-year Research Master Human Language Technology is a program by CLTL.

The Meet & Greet is an excellent opportunity to introduce your company or organisation to Human Language Technology students, and for master students to present their research topic or area of expertise to you.

Join our afternoon program in the presence of the Reference Machine, LeoLani a Pepper robot!

Location
Lecture hall HG 10A.00 (main building at Floor 10, Wing: A), Main building , Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam.

Program
13:30 – 14:00 Walk-In / Doors open / Registration & Coffee
14:00 – 14:05 Introduction: Prof. Dr. Piek Vossen
14:05 – 14:45 Company pitches I
14:45 – 15:15 Student pitches I
15:15 – 15:30 Coffee Break
15:30 – 16:15 Company pitches II
16:15 – 16:45 Student pitches II
16:45 – 17:00 Q&A Reference Machine
17:00 – 18:00 Networking drinks

Pepper_Reference_Machine LeoLani, a Reference Machine

Call for VU University Research Fellow 2017-2018

Apply for University Research Fellow 2017-2018
Deadline Friday 30 June 2017

Who makes our robots talk?

Who takes up this challenge and the exciting opportunity to work in an inspiring research group that is among the best in the world in the area of natural language understanding?

Spinoza prize winner Prof. dr. Piek Vossen has the honour to invite you to apply for the position of University Research Fellow for the academic year 2017-2018. As a University Research Fellow, you work for one year one day a week on a prestigious research project within the research group of Prof. Vossen: the Computational Lexicology and Terminology Lab (CLTL).Call for VU University Research Fellow 2017-2018Humanoid robots: Pepper by Aldebaran Robotics and SoftBank, and NAO by Aldebaran Robotics.

We recently bought a robot and now want to you to plug in our natural language processing technology so that the robot can respond to people in an intelligent way. If you are a wise girl or wise guy and you are interested in Artificial Intelligence, Natural Language Processing and robotics, then you are the perfect candidate to turn our robots into wise bots.

You will work with a real Pepper or NAO robot. The programming environment is Choregraphe and some programming skills in Python are recommended.

As an URF, you will have the chance to publish a paper and attend a conference. It is also an honorable position that looks great on your CV. You will work with PhD students and PostDocs that do exciting work in the area of natural language understanding. There is an opportunity to present a talking robot at the Weekend of Science (“Weekend van de Wetenschap”) to a general audience and basic school kids in October and your robot can be present with you at the opening of the new Computer Science building in 2018.

When you win the prize your activities will be funded for one day a week for one year starting September 2017.

Piek Vossen appointed Pia Sommerauer as VU Fellow for the 2016-2017 academic year.Piek Vossen appointed Pia Sommerauer as VU Fellow for the 2016-2017 academic year.

If you are interested, send an email to Selene Kolman by Friday 30 June 2017, listing:

— a brief motivation
— your interests and ideas related to Natural Language Processing and robotics
— your (Python) programming skills
— your undergraduate degree
— the master courses you have taken and intend to take
— your list of grades

For more information visit websites below or contact:
Prof. dr. Piek Vossen
Selene Kolman

Further information on VU University Research Fellowship (URF)

Prof. dr. Piek Vossen

Professor Computational Lexicology
Language, Literature and Communication
Faculty of Humanities, VU University
de Boelelaan 1105, 1081 HV Amsterdam, The Netherlands

VU University Research Fellow 2015-2016 Soufyan BelkaidPiek Vossen appointed Soufyan Belkaid as VU Fellow for the 2015-2016 academic year.

Piek Vossen appointed Chantal van Son as VU University Research Fellow for the 2014-2015 academic yearPiek Vossen appointed Chantal van Son as VU Fellow for the 2014-2015 academic year.

Team CLARIAH wins Audience Award at Hackalod 2016

(A version of this post previously appeared on http://www.clariah.nl/en/new/blogs/575-team-clariah-wins-audience-award-at-hackalod-2016)

It all seemed rather funny to them, until the very moment they laid eyes upon the prison block. As ‘Team Clariah’ Marieke van Erp (VU, WP3) and Richard Zijdeman (IISG, WP4) participated in the National Library’s HackaLOD on 11-12 November. Alongside seven other teams they faced the challenge of building a cool (prototype) application using Linked Open Data made especially available for this event, by the National Library and Heritage partners. It had to be done within 24 hours… Inside a former prison… Here’s their account of the event.

We set out on Friday, somewhat dispirited as our third team mate Melvin Wevers (UU) was caught out by a cold. Upon arrival, it turned out we had two cells: one for hacking and one for sleeping (well more like for a three-hour tossing and turning). As you’d expect, the cells were not exactly cosy, but the organisers had provided goodie bags from which the contents were put to good use and even a Jaw Harp midnight concert.

With that, and our pre-set up plan to tell stories around buildings we set out to build our killer app. We found several datasets that contain information about buildings. The BAG for example contains addresses, geo-coordinates and information about how a building is used (as a shop or a gathering place) and ‘mutations’ (things that happened to the building). However, what it doesn’t contain is building names (for example Rijksmuseum or Wolvenburg), which is contained in the Rijksmonumenten dataset. But the Rijksmonumenten dataset doesn’t contain addresses, but as both contain geo-coordinates, they can be linked. Yay for Linked Data!

To tell the stories, we wanted to find some more information in the National Library’s newspaper collection. With some help from other hackers we managed to efficiently bring up news articles that mention a particular location. With some manual analysis, we for example found that for Kloveniersburgwal 73 up until 1890 there was a steady stream of ads asking for ‘decent’ kitchen maids, followed by a sudden spike in ads announcing real estate. It turns out a notary had moved in, for which another (not linked) dataset could also provide a marriage license, confirmed by a wedding ad in the newspaper. These sort of stories can give us more insight into what happened in a particular building at a given time.

We have made some steps in starting to analyse these ads automatically to detect these changes in order to automatically generate timelines for locations, but we didn’t get that done in 24 hours. However, the audience was sufficiently pleased with our idea for us to win the audience award! (Admittedly to our great surprise, as the other teams’ ideas were all really awesome as well). We’re now looking for funding to complete the prototype.

In summary, it was all great fun, not in the least due to great organisation by the National Library as well as the nice ‘bonding’ atmosphere among the teams. So, our lessons learnt:

  • prison food is really not that bad (and there was lots of it)
  • 24 hours of hacking is heaps of fun
  • the data always turn out to behave different from what you’d expect
  • isolated from the daily routine, events like these prove crucial to foster new ideas and relations, in order to keep the field in motion.

 

On Wednesday 16 November, CLTL member Marieke van Erp was also interviewed with Martijn Kleppe, one of the Hackalod organisers on Radio 1 about the hackathon. You can listen back to it here (in Dutch).

Papers accepted at COLING 2016

Two papers from our group have been accepted at the 26th International Conference on Computational Linguistics COLING 2016, at Osaka, Japan, from 11th to 16th December 2016.

sushi_COLING

Semantic overfitting: what ‘world’ do we consider when evaluating disambiguation of text? by Filip Ilievski, Marten Postma and Piek Vossen

Abstract
Semantic text processing faces the challenge of defining the relation between lexical expressions and the world to which they make reference within a period of time. It is unclear whether the current test sets used to evaluate disambiguation tasks are representative for the full complexity considering this time-anchored relation, resulting in semantic overfitting to a specific period and the frequent phenomena within.
We conceptualize and formalize a set of metrics which evaluate this complexity of datasets. We provide evidence for their applicability on five different disambiguation tasks. Finally, we propose a time-based, metric-aware method for developing datasets in a systematic and semi-automated manner.

More is not always better: balancing sense distributions for all-words Word Sense Disambiguation by Marten Postma, Ruben Izquierdo and Piek Vossen

Abstract
Current Word Sense Disambiguation systems show an extremely low performance on low frequent senses, which is mainly caused by the difference in sense distributions between training and test data. The main focus in tackling this problem has been on acquiring more data or selecting a single predominant sense and not necessarily on the meta properties of the data itself. We demonstrate that these properties, such as the volume, provenance and balancing, play an important role with respect to system performance. In this paper, we describe a set of experiments to analyze these meta properties in the framework of a state-of-the-art WSD system when evaluated on the SemEval-2013 English all-words dataset. We show that volume and provenance are indeed important, but that perfect balancing of the selected training data leads to an improvement of 21 points and exceeds state-of-the-art systems by 14 points while using only simple features. We therefore conclude that unsupervised acquisition of training data should be guided by strategies aimed at matching meta-properties.

VU Master’s Day, Mar. 12 2016

Visit our Research Master Linguistic Engineering at VU Master’s Day Saturday 12 March 2016
flyer Research Master Linguistic EngineeringFlyer Linguistic Engineering, Specialization of the Research Master Linguistics.
Overview Courses Linguistic Engineering.

On 12 March 2016 you will have the opportunity to visit the Master’s Day and obtain detailed information on our Research Master Linguistic Engineering, Specialization of the Research Master Linguistics.

Date Saturday, 12 March 2016
Time 9.30 am – 2.30 pm
Target Group Higher education students and professionals
Location Main Building, VU University Amsterdam, De Boelelaan 1105 (directions)
Please note Preregistration is open until 12.00 pm on Friday 11 March

Programme

VU_Masters_Day

Specialization ‘Linguistic Engineering’ 2017—2018

Linguistic Engineering is a specialization in the Research Master Linguistics at VU Amsterdam. More details on the: Programme, Admission and Application.

Overview Courses Research Master Specialization: Linguistic Engineering
Overview Courses Research Master Linguistic Engineering, in Research Master LinguisticsView/download flyer Research Master Linguistic Engineering. Programme, admission and application.

Language technology is a rapidly developing field of research. In humanistic research nowadays a firm background in language technology is extremely valuable in the context of manipulating large datasets. The Computational Lexicology and Terminology Lab (CLTL) offers a specialization in the research master Linguistics in which students are trained as linguistic engineer. A linguistic engineer has knowledge of language technology as used in computer applications (e.g. search engines) and of the relevant linguistics.

WHY STUDY AT VU AMSTERDAM?
• The Computational Lexicology and Terminology Lab (CLTL) is one of the world’s leading research institutes in Linguistic Engineering.
• Prof. Dr. Piek Vossen, winner of NWO Spinoza Prize, is leading the group of researchers and several national and international interdisciplinary projects, including the Spinoza project ‘Understanding Language by Machines’.
• Become part of an international group of researchers at Vrije Universiteit Amsterdam!

CAREER PROSPECTS
You can set up your own field of research as a PhD student or you can embark on a career at a research institute. Other opportunities are in the industry, which is in need of linguists with a technical background. Being a graduate of the CLTL will certainly enhance your chances.

flyer Research Master Linguistic Engineering

ADMISSION REQUIREMENTS
• Applicants must have at least a Bachelor’s degree in Linguistics, Artificial Intelligence or comparable Bachelor programme.
• Applicants who do not meet the requirement(s) are also encouraged to apply, provided that they have a sound academic background and a demonstrated interest in and knowledge of engineering and/or linguistics.

SPECIALIZATION: LINGUISTIC ENGINEERING
IN RESEARCH MASTER: LINGUISTICS
LANGUAGE: ENGLISH
DURATION: 2 YEARS FULLTIME
DEADLINE: APRIL 1 2016 (NON-EU), JUNE 1 2016 FOR DUTCH AND EU STUDENTS

For more details on the programme, admission and application:
WWW.FGW.VU.NL
WWW.VU.NL/MA-LINGUISTICS
Dr. H. D. van der Vliet: +31 (0)20 598 6466
EMAIL: Dr. H. D. van der Vliet

Computational Lexicology and Terminology Lab (CLTL)
Language, Literature and Communication
Faculty of Humanities
VU Amsterdam
de Boelelaan 1105
1081 HV Amsterdam
The Netherlands

General information on the Research Master’s in Linguistics at VU Amsterdam.

Master’s Evening, Dec. 01 2015

Master’s Evening 1 december 2015

On Tuesday 1 December 2015 you can visit our Master’s Evening where you can get informed about most of our Master’s degree programmes during information sessions. Please register and choose which of these sessions you would like to attend.

Date: Tuesday 1 December 2015
Time: 17:00 – 20:30
For whom: Higher education students and professionals
Location: Main building VU Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam

Please find information on our Research Master Specialization ‘Linguistic Engineering’

If you are not able to attend our Master’s Evening on 1 December 2015, you can visit the Master’s Day on Saturday 12 March 2016 or find out more about VU Amsterdam and our study programmes:
• Find your international Master’s degree programme and contact the coordinator for questions
Meet VU Amsterdam representatives in your own country
Visit our international students Facebook

Press release VU University on NewsReader: Join the hackathon!

VU-hoogleraar en Spinozawinnaar Piek Vossen presenteert NewsReader
Ontdek ook zelf deze nieuwe technologie die het nieuws leest

In 2013 startte Piek Vossen (hoogleraar computationele lexicologie), samen met onderzoekers in Trento en San Sebastian en met de bedrijven LexisNexis (NL), SynerScope (NL) en het Engelse ScraperWiki het NewsReader project om een computerprogramma te ontwikkelen dat dagelijks het nieuws ‘leest’ en precies bijhoudt wat, wanneer, waar gebeurd is in de wereld en wie er bij betrokken is. Het project kreeg hiervoor 2,8 miljoen euro subsidie van de Europese Commissie.

SynerScope‘s visualization: extraction from 1.26M news articles

Nieuws lezen in vier talen
In afgelopen 3 jaar hebben de onderzoekers een technologie ontwikkeld om de computer automatisch het nieuws te laten lezen in vier talen. Uit miljoenen krantenartikelen is nu een doorzoekbare database gemaakt waarin duplicaten zijn ontdubbeld, complementerende informatie uit verschillende berichten op een slimme manier samengevoegd is en is de informatie verrijkt met fijnmazige types zodat je niet alleen op persoonsnamen zoals ‘Mark Rutte’ en `Diederik Samsom’ kunt zoeken, maar ook op entiteiten van het type ‘politicus’.

Presentatie NewsReader
Op dinsdagmiddag 24 november 2015 organiseert de onderzoeksgroep Computational Lexicology & Terminology Lab (CLTL) van Piek Vossen een workshop waarin de eindresultaten van het project gepresenteerd worden. Ook zijn er diverse sprekers die hun visie op het project geven, zoals VU-hoogleraar Frank van Harmelen (Knowledge Representation & Reasoning), Bernardo Magnini, onderzoeker bij FBK in Trento en Sybren Kooistra, data journalist bij de Volkskrant en medebedenker en hoofredacteur van Yournalism, het platform voor onderzoeksjournalistiek.

Doe mee met de Hackathon!
Op 25 november kunnen gebruikers zelf aan de slag met de nieuwsdatabase die is opgebouwd uit miljoenen krantenartikelen. Meer informatie en aanmelden.

VENI grant for Antske Fokkens

Antske Fokkens received a VENI grant for her proposal Reading between the lines. The project aims at identifying so-called implicit perspectives in text.

Perspectives are conveyed in many ways. Explicit opinions or highly subjective terms are easily identified. However, perspectives are also expressed more subtly. For instance, Nick Wing argues that media describe white suspects (e.g. brilliant, athletic) more positively than black victims (e.g. gang member, drug problems). Ivar Vermeulen (p.c.) observes in a small Dutch corpus that Moroccan perpetrators are easily called thieves (implying generic behavior), where other perpetrators from Dutch only stole something (implying incidental behavior). These observations are anecdotal, but reveal how choices concerning what information to include or how to describe someone’s role may display a specific perspective.

This project will investigate how linguistic analyses may be used to identify these more implicit ways of expressing perspectives in text. This research will be carried out in three stages: First, large scale corpus analyses will be applied to identify distributions of semantic roles (what entities do) and other properties assigned to them (their characteristics). In the second stage, generic participants will be linked to the semantic role they imply (e.g. a thief will be linked to the perpetrator of stealing). With these links, we can investigate whether thieves are described differently from people who steal. In the third stage, emotion and sentiment lexica will be used to identify the sentiment associated with descriptions of people enabling research that investigates whether people are depicted positively or negatively.

The research is carried out in the context of digital humanities and social sciences. Evaluation and experimental setup will be guided towards identifying differences in perspective between sources. In addition to correctness of linguistic analyses (intrinsic evaluation), the possibility of using the method for identifying changes in perspective over time (historic research) or differences in perspective between sources (communication science) will be investigated.