Author Archives: selene

The Global Wordnet Association is pleased to announce the 8th International Global Wordnet Conference (GWC2016).
Bucharest, Romania, January 27-30, 2016
Global Wordnet Association: www.globalwordnet.org
Conference website: http://gwc2016.racai.ro/

The conference will be hosted by the Research Institute for Artificial Intelligence “Mihai Drăgănescu” of the Romanian Academy (local organization: Verginica Mititelu and Corina Forăscu).

Details about the Association and the conference can be found on the conference website (http://gwc2016.racai.ro/).

NWO Project granted: CLARIAH

Leave a reply

CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) project granted by NWO in the National Roadmap for Large-Scale Research Facilities programme.

CLARIAH is developing a digital infrastructure that brings together large collections of data and software from different humanities disciplines. This will enable humanities researchers, from historians, literature experts and archaeologists to linguists, speech technologists and media scientists – to investigate cross-disciplinary questions, for example about culture and societal change. CLARIAH has received 12 million euros for the development of research instruments and the training of scientists. This project is vitally important for the development of the humanities in the Netherlands; a digital revolution is taking place that will drastically change how humanities research is done. The potential societal impact of this is also considerable.

CLARIAH: BIG DATA, GRAND CHALLENGES background article (pdf).

Organisations involved (applicants): Huygens ING, International Institute for Social History, Meertens Institute, Netherlands Institute for Sound and Vision, DANS, Radboud University Nijmegen, Utrecht University, University of Amsterdam and VU University Amsterdam. Project leader: Prof. A.F. (Lex) Heerma van Voss.

Prof. dr. Piek Vossen is part of the CLARIAH core team.

CLARIAH | NWO-programma Nationale Roadmap on YouTube (in Dutch)

CLARIAH kickoff 2015 on YouTube (in Dutch)

Release Cornetto Demo

Leave a reply

Cornetto is a lexical resource for the Dutch language which combines two resources with different semantic organisations: the Dutch Wordnet with its synset organisation and the Dutch Reference Lexicon which includes definitions, usage constraints, selectional restrictions, syntactic behaviours, illustrative contexts, etc. For more information on the contents of Cornetto, see Cornetto user documentation.

Cornetto — Demo by INL, VU University and CLARIN-NL Cornetto demo. Visualization of Synset Relations of the Dutch Word Form: ‘bloem’, Part of Speech: Noun, Sense Number: 3.

The Cornetto demo provides possibilities to query the resource by choosing one of the following options:

Simple Search for Lexical Entries
Advanced Search for Lexical Entries
Visualization of Synset Relations

Cornetto is also available in XML (following the Lexicon Markup Format) and RDF (for more information, please refer to Open Source WordNet).

The Cornetto demo is realized as part of the Cornetto-LMF-RDF project, which has been funded by CLARIN-NL (www.clarin.nl).

NLeSC Project granted: Visualizing uncertainty and perspectives

Leave a reply

Visualizing uncertainty and perspectives: Netherlands eScience Center, project number 027.014.402 (April 01, 2015 – April 01, 2016)

The Netherlands eScience Center is pleased to announce the initiation of six new projects in the areas of Environment and Sustainability, Life Sciences & eHealth, Humanities and Social Sciences and Physics and Beyond. The projects, scheduled to start in 2015, are collaborations with research teams from multiple Dutch academic groups and represent the latest step in the continued development of NLeSC’s project portfolio. Two projects will be funded in the areas of Humanities and Social Sciences.

Visualizing uncertainty and perspectives
Prof. Piek Vossen and Antske Fokkens are co-applicants of the project “Visualizing uncertainty and perspectives”: one of the six proposals that will receive funding to the value of 125K euro. This project aims to develop a tool that visualizes subjectivity, perspective and uncertainty to make them controllable variables in Humanities research. The tool should allow users to compare information from different sources representing alternative perspectives and visualize subjectivity and uncertainty. Such a visualization enables improved and comprehensive source criticism, provides new directions of research and strengthens the methodology of digital humanities.

Hackathon NewsReader Amsterdam: Jan. 21, 2015

Leave a reply

Porsches to Pizza – Hack 6,000,000 automotive news articles #NewsReader

The global automotive industry has a value of the order of $1 trillion annually.

The industry comprises a massive network of suppliers, manufacturers, advertisers, marketeers and journalists. Attracting and supporting the industry is a significant goal of industrial policy.

On January 21st 2015 we’re running an event which should be of interest if :-

You’re a data journalist on an automotive desk;
You’re an analyst sifting daily news looking for information on your company or on competitors;
You’re a data analyst looking to understand how your customers operate their supply chain;
You’re an analyst trying to find secondary events that could influence an investment decision

We’ve developed a powerful new tool called ‘NewsReader’ which utilises natural language understanding and semantic web technology. This helps you to better understand the interactions between companies and key individuals, derived from news articles.

We’re processing 6 million news articles from sources around the world both general and specialist media to obtain a searchable database of the news on the automotive sector and you can play with at our Hack event.

Over summer we ran a Hack Day on news surrounding the World Cup. NewsReader enabled the attendees to pull out networks of interactions between politicians, football players and people in FIFA. Not only who they were interacting with but what they were doing.

Early analysis of the automotive data is giving us some interesting insights. For example, news stories between 2005 and 2009 reported that Porsche was buying an ever larger stake in Volkswagen, prompting speculations that Porsche would take over Volkswagen. However, in 2009 the tables turned and Volkswagen took a majority stake and eventually took over Porsche. Our system is able to discriminate between articles mentioning that Volkswagen was taking over Porsche and vice versa rather than simple co-occurrences that are generally found in aggregated news analysis systems. In the slipstream of this take-over, Wendelin Wiedeking, longtime Porsche senior executive, was fired from Porsche. In 2013 he opened the first of a chain of Italian restaurants. As we have a structured database in which similar events can easily be retrieved, it is a small effort to find out that Jürgen Schrempp, former CEO of DaimlerChrysler, also opened a restaurant after retiring from the car industry.

We are running an event in London on January 30th 2105 and if you cannot make the 21st in Amsterdam you may want to join us there.

NewsReader helps you find a needle in a haystack.

REGISTER FOR HACKATHON AMSTERDAM: https://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14504369961

REGISTER FOR HACKATHON LONDON: http://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14458478699?aff=efbnen

Mini-seminar: Disambiguating entities: Dec. 12 2014

Leave a reply

Presentations Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge on December 12, 2014:

– Introduction (pdf) by Prof. dr. Piek Vossen
– Towards a Dutch FrameNet-style Semantic Role Labeler (pdf) Presentation by Chantal van Son
– Named Entity Disambiguation with two-stage coherence optimization (pdf) Presentation by Filip Ilievski

Invitation below:

Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge

Dear all,

We cordially invite you to our mini-seminar “Disambiguating entities and their roles in texts based on background knowledge ” in which we will present our Master’s thesis topics and the current/future work. It will take place on Friday, December 12 from 10:00 to 12:00 in room C-121 (W&N Building).

An array of text processing tools is currently used to extract events, recognize and link entities, and discover relations between the two. We, Filip Ilievski and Chantal van Son, tackle these type of Natural Language Processing tasks by using background knowledge from lexical resources and the Semantic Web. The disambiguation of entities and their context is in the core of both approaches: Filip’s thesis aims to disambiguate them by determining their identity while Chantal’s thesis aims at disambiguating the roles they play in context. You can find the descriptions of both projects below.

Prof. Piek Vossen will kick-off the mini-seminar by depicting the background of the problem and presenting the existing approaches. Prof. Frank van Harmelen will conclude the event with a discussion on the integration of background knowledge in language processing.

Program:
10:00 – 10:15 Introduction by Piek Vossen
10:20 – 10:50 Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
10:55 – 11:25 Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
11:30 – 12:00 Closing remarks and discussion lead by Frank van Harmelen

Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
Semantic role labeling (SRL) is one of the key tasks in Natural Language Processing for deep text understanding. Because of its rich and fine-grained categorization of different conceptual scenarios and their specific semantic roles, FrameNet is a popular resource to serve as a basis for SRL systems in English. For Dutch however there is currently no FrameNet-like resource available that can be used to train a SRL system, and creating such a resource usually takes a great deal of expensive manual effort. This study investigates how existing tools and resources, such as the SoNaR Semantic Role Labeler (SSRL) and the Predicate Matrix, can be exploited for FrameNet based SRL in Dutch. In this talk I will present this method while discussing some of its difficulties and possible solutions to solve them.

Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
Contemporary Natural Language Processing modules solve Entity Linking, Event Detection, and Semantic Role Labeling as separate problems. From the semantic point of view, each of these processes adds another brush-stroke onto the canvas of meaning: entities and events are components that occur in relations which correspond to roles. The approach presented here extends such NLP processes with a semantic process of coherence optimization. I use both binary logic and probabilistic models built through manual and automatic techniques. The binary filtering phase relies on restrictions from VerbNet and a domain-specific ontology. The optimization phase aims to maximize the coherence between the remaining candidates in a probabilistic manner based on available background knowledge about the entities.

Kind regards,
Filip Ilievski & Chantal van Son

Release Open Source Dutch WordNet

Leave a reply

Open Source Dutch Wordnet is a Dutch lexical semantic database.

Demo of Open Source Dutch WordNet. Release first version of the Open Dutch Wordnet (ODWN): December 02, 2014. By Marten Postma and Piek Vossen.

ODWN was created by removing the proprietary content from Cornetto (http://www2.let.vu.nl/oz/cltl/cornetto), and by using open source resources to replace this proprietary content.

Open Source Dutch WordNet contains 116,992 synsets, of which 95,356 originate from WordNet 3.0 and 21,636 synsets are new synsets. The number of English synsets without Dutch synonyms is 60,743, which means that 34,613 WordNet 3.0 synsets have been filled with at least one Dutch synonym.

The demo of Open Source Dutch WordNet can be inspected by go through these steps:
(1) Use as browser Google Chrome or Mozilla Firefox
(2) Go to https://debvisdic.let.vu.nl:9002/editor/
(3) login with: username: gast, password: gast
(4) click in the left box on ‘ODWN’ add click the ‘Add’ button
(5) click the button ‘Open dictionaries’ and inspect the resource

This project has been co-funded by the Nederlandse Taalunie (2013-2014).

Summer School Perspectives on Subjectivity: July 06-17, 2015

Leave a reply

Piek Vossen and his team organize a Summer School course “Perspectives on Subjectivity”, VU University Amsterdam, the Netherlands from July 06-17, 2015.

Who should join this course?

Advanced Bachelor/ Master students of Linguistics (theoretical, applied or computational), Artificial Intelligence and Journalism, and others interested in cultural perspectives in communication.

Course content

Welcome to Perpectives on Subjectivity: an excellent course in Amsterdam! That is a very subjective opening, as you will agree. Here the subjective perspective of the writer is obvious, but as a rule it is much more subtle. Subjectivity is one of the key elements of natural language. Every communicative act is subjective to some degree. Subjectivity starts with the intentions of the producer of the message and affects its associated functions and syntactic structures, not to mention the choice of vocabulary and associated connotations. All of this can be summarized as perspective.

This course combines theoretical linguistic notions about perspectives with hands-on work on real language data in the lab. Moving between theory, discussions, practical data annotation and data use (machine learning and quantative/ qualitative analysis), you explore a wide range of linguistic phenomena: reference, modality, attribution, registers, sentiment analysis, opinions, temporal processing and so on.

Perspectives on Subjectivity is provided by the Computational Lexicology & Terminology Lab (CLTL), which models computer understanding of natural language with a central role for sources like lexicons, ontologies and terminology. The CLTL is part of the Department of Language and Communication in the Faculty of humanities at VU University Amsterdam.

Learning objectives

You are familiar with linguistic theory on major topics in subjectivity and perspective: reference, modality, attribution, registers, sentiment analysis, opinions and temporal processing.

You can evaluate the relevant linguistic theory in critical discussions.

You can apply the theory tot the description and analysis of real language data.

You can annotate that data and analyse it in lab sessions, using machine learning and quantitative techniques (as used in data journalism, for instance).

Round table on ‘Time and Language’: Oct. 30, 2014

Leave a reply

Event date:
Thursday, 30 October, 2014 – 18:30 to 20:00

The round table “Time and Language”, organized by Tommaso Caselli (VUA, Amsterdam) and Rachele Sprugnoli (DH-FBK), will be held in Genoa on Thursday October 30 as part of “Festival della Scienza “.

Time is a pervasive element of human life that is also reflected in the language. But how time is encoded in the various languages of the world? How long is an event? What happens if we want to teach a computer to reconstruct the temporal order of events in a text? The philosopher of language Andrea Bonomi, the linguist Pier Marco Bertinetto and the computational linguist Bernardo Magnini will answer these questions to reveal to the public the role that time has in the language and the challenges of technology in this field. Philosophy, linguistics and technology come together and introduce the public to the fascinating relationship between time and language.

TiNT: Terminologie in het Nederlandse Taalgebied: Nov. 14, 2014

Leave a reply

Impression of TiNT 2014, November 14, 2014 and link to programme

Op 14 november 2014 organiseert de vereniging NL-Term in samenwerking met het Steunpunt Nederlandstalige Terminologie voor de zesde maal de TiNT-dag. TiNT staat voor Terminologie in het Nederlandse Taalgebied. Dit jaarlijks terugkerende evenement is bedoeld om actueel onderzoek en de professionele praktijk op het gebied van Nederlandstalige terminologie voor te stellen voor een breed publiek. TiNT 2014 vindt plaats op het Ministerie van Buitenlandse Zaken in Den Haag, van 9.30 tot 18.00 uur. Het thema is dit jaar Terminologie in de communicatie tussen overheid en burger. We hebben zeer interessante en voor het thema relevante sprekers vast kunnen leggen, waaronder Alex Brenninkmeijer van de Europese Rekenkamer, tot voor kort Nationale Ombudsman, Geert Joris, Algemeen Secretaris van de Nederlandse Taalunie en Jac Brouwer, Landelijk Huisstijlcoördinator van de Belastingdienst.

We hopen ook dit jaar weer op een volle zaal, interessante presentaties en levendige discussie.

Meer informatie, het voorlopige programma en het inschrijfformulier kunt u vinden op onze website: http://taalunieversum.org/inhoud/tint-2014.

Met vriendelijke groet,
namens het SNT en NL-Term
Anneleen Schoen
steunpunt@let.vu.nl

CLTL

the Computational Linguistics & Text Mining Lab

Author Archives: selene

Call for papers Global Wordnet Conference (GWC2016)

NWO Project granted: CLARIAH

Release Cornetto Demo

NLeSC Project granted: Visualizing uncertainty and perspectives

Hackathon NewsReader Amsterdam: Jan. 21, 2015

Mini-seminar: Disambiguating entities: Dec. 12 2014

Release Open Source Dutch WordNet

Summer School Perspectives on Subjectivity: July 06-17, 2015

Round table on ‘Time and Language’: Oct. 30, 2014

TiNT: Terminologie in het Nederlandse Taalgebied: Nov. 14, 2014