Monthly Archives: December 2014

NLeSC Project granted: Visualizing uncertainty and perspectives

Visualizing uncertainty and perspectives: Netherlands eScience Center, project number 027.014.402 (April 01, 2015 – April 01, 2016)

The Netherlands eScience Center is pleased to announce the initiation of six new projects in the areas of Environment and Sustainability, Life Sciences & eHealth, Humanities and Social Sciences and Physics and Beyond. The projects, scheduled to start in 2015, are collaborations with research teams from multiple Dutch academic groups and represent the latest step in the continued development of NLeSC’s project portfolio. Two projects will be funded in the areas of Humanities and Social Sciences.

Visualizing uncertainty and perspectives
Prof. Piek Vossen and Antske Fokkens are co-applicants of the project “Visualizing uncertainty and perspectives”: one of the six proposals that will receive funding to the value of 125K euro. This project aims to develop a tool that visualizes subjectivity, perspective and uncertainty to make them controllable variables in Humanities research. The tool should allow users to compare information from different sources representing alternative perspectives and visualize subjectivity and uncertainty. Such a visualization enables improved and comprehensive source criticism, provides new directions of research and strengthens the methodology of digital humanities.

Hackathon NewsReader Amsterdam: Jan. 21, 2015

Leave a reply

Porsches to Pizza – Hack 6,000,000 automotive news articles #NewsReader

The global automotive industry has a value of the order of $1 trillion annually.

The industry comprises a massive network of suppliers, manufacturers, advertisers, marketeers and journalists. Attracting and supporting the industry is a significant goal of industrial policy.

On January 21st 2015 we’re running an event which should be of interest if :-

You’re a data journalist on an automotive desk;
You’re an analyst sifting daily news looking for information on your company or on competitors;
You’re a data analyst looking to understand how your customers operate their supply chain;
You’re an analyst trying to find secondary events that could influence an investment decision

We’ve developed a powerful new tool called ‘NewsReader’ which utilises natural language understanding and semantic web technology. This helps you to better understand the interactions between companies and key individuals, derived from news articles.

We’re processing 6 million news articles from sources around the world both general and specialist media to obtain a searchable database of the news on the automotive sector and you can play with at our Hack event.

Over summer we ran a Hack Day on news surrounding the World Cup. NewsReader enabled the attendees to pull out networks of interactions between politicians, football players and people in FIFA. Not only who they were interacting with but what they were doing.

Early analysis of the automotive data is giving us some interesting insights. For example, news stories between 2005 and 2009 reported that Porsche was buying an ever larger stake in Volkswagen, prompting speculations that Porsche would take over Volkswagen. However, in 2009 the tables turned and Volkswagen took a majority stake and eventually took over Porsche. Our system is able to discriminate between articles mentioning that Volkswagen was taking over Porsche and vice versa rather than simple co-occurrences that are generally found in aggregated news analysis systems. In the slipstream of this take-over, Wendelin Wiedeking, longtime Porsche senior executive, was fired from Porsche. In 2013 he opened the first of a chain of Italian restaurants. As we have a structured database in which similar events can easily be retrieved, it is a small effort to find out that Jürgen Schrempp, former CEO of DaimlerChrysler, also opened a restaurant after retiring from the car industry.

We are running an event in London on January 30th 2105 and if you cannot make the 21st in Amsterdam you may want to join us there.

NewsReader helps you find a needle in a haystack.

REGISTER FOR HACKATHON AMSTERDAM: https://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14504369961

REGISTER FOR HACKATHON LONDON: http://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14458478699?aff=efbnen

Mini-seminar: Disambiguating entities: Dec. 12 2014

Leave a reply

Presentations Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge on December 12, 2014:

– Introduction (pdf) by Prof. dr. Piek Vossen
– Towards a Dutch FrameNet-style Semantic Role Labeler (pdf) Presentation by Chantal van Son
– Named Entity Disambiguation with two-stage coherence optimization (pdf) Presentation by Filip Ilievski

Invitation below:

Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge

Dear all,

We cordially invite you to our mini-seminar “Disambiguating entities and their roles in texts based on background knowledge ” in which we will present our Master’s thesis topics and the current/future work. It will take place on Friday, December 12 from 10:00 to 12:00 in room C-121 (W&N Building).

An array of text processing tools is currently used to extract events, recognize and link entities, and discover relations between the two. We, Filip Ilievski and Chantal van Son, tackle these type of Natural Language Processing tasks by using background knowledge from lexical resources and the Semantic Web. The disambiguation of entities and their context is in the core of both approaches: Filip’s thesis aims to disambiguate them by determining their identity while Chantal’s thesis aims at disambiguating the roles they play in context. You can find the descriptions of both projects below.

Prof. Piek Vossen will kick-off the mini-seminar by depicting the background of the problem and presenting the existing approaches. Prof. Frank van Harmelen will conclude the event with a discussion on the integration of background knowledge in language processing.

Program:
10:00 – 10:15 Introduction by Piek Vossen
10:20 – 10:50 Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
10:55 – 11:25 Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
11:30 – 12:00 Closing remarks and discussion lead by Frank van Harmelen

Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
Semantic role labeling (SRL) is one of the key tasks in Natural Language Processing for deep text understanding. Because of its rich and fine-grained categorization of different conceptual scenarios and their specific semantic roles, FrameNet is a popular resource to serve as a basis for SRL systems in English. For Dutch however there is currently no FrameNet-like resource available that can be used to train a SRL system, and creating such a resource usually takes a great deal of expensive manual effort. This study investigates how existing tools and resources, such as the SoNaR Semantic Role Labeler (SSRL) and the Predicate Matrix, can be exploited for FrameNet based SRL in Dutch. In this talk I will present this method while discussing some of its difficulties and possible solutions to solve them.

Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
Contemporary Natural Language Processing modules solve Entity Linking, Event Detection, and Semantic Role Labeling as separate problems. From the semantic point of view, each of these processes adds another brush-stroke onto the canvas of meaning: entities and events are components that occur in relations which correspond to roles. The approach presented here extends such NLP processes with a semantic process of coherence optimization. I use both binary logic and probabilistic models built through manual and automatic techniques. The binary filtering phase relies on restrictions from VerbNet and a domain-specific ontology. The optimization phase aims to maximize the coherence between the remaining candidates in a probabilistic manner based on available background knowledge about the entities.

Kind regards,
Filip Ilievski & Chantal van Son

Release Open Source Dutch WordNet

Leave a reply

Open Source Dutch Wordnet is a Dutch lexical semantic database.

Demo of Open Source Dutch WordNet. Release first version of the Open Dutch Wordnet (ODWN): December 02, 2014. By Marten Postma and Piek Vossen.

ODWN was created by removing the proprietary content from Cornetto (http://www2.let.vu.nl/oz/cltl/cornetto), and by using open source resources to replace this proprietary content.

Open Source Dutch WordNet contains 116,992 synsets, of which 95,356 originate from WordNet 3.0 and 21,636 synsets are new synsets. The number of English synsets without Dutch synonyms is 60,743, which means that 34,613 WordNet 3.0 synsets have been filled with at least one Dutch synonym.

The demo of Open Source Dutch WordNet can be inspected by go through these steps:
(1) Use as browser Google Chrome or Mozilla Firefox
(2) Go to https://debvisdic.let.vu.nl:9002/editor/
(3) login with: username: gast, password: gast
(4) click in the left box on ‘ODWN’ add click the ‘Add’ button
(5) click the button ‘Open dictionaries’ and inspect the resource

This project has been co-funded by the Nederlandse Taalunie (2013-2014).

Summer School Perspectives on Subjectivity: July 06-17, 2015

Leave a reply

Piek Vossen and his team organize a Summer School course “Perspectives on Subjectivity”, VU University Amsterdam, the Netherlands from July 06-17, 2015.

Who should join this course?

Advanced Bachelor/ Master students of Linguistics (theoretical, applied or computational), Artificial Intelligence and Journalism, and others interested in cultural perspectives in communication.

Course content

Welcome to Perpectives on Subjectivity: an excellent course in Amsterdam! That is a very subjective opening, as you will agree. Here the subjective perspective of the writer is obvious, but as a rule it is much more subtle. Subjectivity is one of the key elements of natural language. Every communicative act is subjective to some degree. Subjectivity starts with the intentions of the producer of the message and affects its associated functions and syntactic structures, not to mention the choice of vocabulary and associated connotations. All of this can be summarized as perspective.

This course combines theoretical linguistic notions about perspectives with hands-on work on real language data in the lab. Moving between theory, discussions, practical data annotation and data use (machine learning and quantative/ qualitative analysis), you explore a wide range of linguistic phenomena: reference, modality, attribution, registers, sentiment analysis, opinions, temporal processing and so on.

Perspectives on Subjectivity is provided by the Computational Lexicology & Terminology Lab (CLTL), which models computer understanding of natural language with a central role for sources like lexicons, ontologies and terminology. The CLTL is part of the Department of Language and Communication in the Faculty of humanities at VU University Amsterdam.

Learning objectives

You are familiar with linguistic theory on major topics in subjectivity and perspective: reference, modality, attribution, registers, sentiment analysis, opinions and temporal processing.

You can evaluate the relevant linguistic theory in critical discussions.

You can apply the theory tot the description and analysis of real language data.

You can annotate that data and analyse it in lab sessions, using machine learning and quantitative techniques (as used in data journalism, for instance).