NWO Project granted: CLARIAH

CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) project granted by NWO in the National Roadmap for Large-Scale Research Facilities programme.

CLARIAH is developing a digital infrastructure that brings together large collections of data and software from different humanities disciplines. This will enable humanities researchers, from historians, literature experts and archaeologists to linguists, speech technologists and media scientists – to investigate cross-disciplinary questions, for example about culture and societal change. CLARIAH has received 12 million euros for the development of research instruments and the training of scientists. This project is vitally important for the development of the humanities in the Netherlands; a digital revolution is taking place that will drastically change how humanities research is done. The potential societal impact of this is also considerable.

CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities)
CLARIAH: BIG DATA, GRAND CHALLENGES background article (pdf).

Organisations involved (applicants): Huygens ING, International Institute for Social History, Meertens Institute, Netherlands Institute for Sound and Vision, DANS, Radboud University Nijmegen, Utrecht University, University of Amsterdam and VU University Amsterdam. Project leader: Prof. A.F. (Lex) Heerma van Voss.

Piek Vossen part of core team CLARIAH Prof. dr. Piek Vossen is part of the CLARIAH core team.

CLARIAH | NWO-programma Nationale Roadmap on YouTube (in Dutch)

CLARIAH kickoff 2015 on YouTube (in Dutch)

Release Cornetto Demo

Cornetto is a lexical resource for the Dutch language which combines two resources with different semantic organisations: the Dutch Wordnet with its synset organisation and the Dutch Reference Lexicon which includes definitions, usage constraints, selectional restrictions, syntactic behaviours, illustrative contexts, etc. For more information on the contents of Cornetto, see Cornetto user documentation.

Cornetto — Demo by INL, VU University and CLARIN-NLCornetto DemoCornetto demo. Visualization of Synset Relations of the Dutch Word Form: ‘bloem’, Part of Speech: Noun, Sense Number: 3.

The Cornetto demo provides possibilities to query the resource by choosing one of the following options:

Simple Search for Lexical Entries
Advanced Search for Lexical Entries
Visualization of Synset Relations

Cornetto is also available in XML (following the Lexicon Markup Format) and RDF (for more information, please refer to Open Source WordNet).

The Cornetto demo is realized as part of the Cornetto-LMF-RDF project, which has been funded by CLARIN-NL (www.clarin.nl).

NLeSC Project granted: Visualizing uncertainty and perspectives

Visualizing uncertainty and perspectives: Netherlands eScience Center, project number 027.014.402 (April 01, 2015 – April 01, 2016)

Netherlands_eScience_Center_800x

The Netherlands eScience Center is pleased to announce the initiation of six new projects in the areas of Environment and Sustainability, Life Sciences & eHealth, Humanities and Social Sciences and Physics and Beyond. The projects, scheduled to start in 2015, are collaborations with research teams from multiple Dutch academic groups and represent the latest step in the continued development of NLeSC’s project portfolio. Two projects will be funded in the areas of Humanities and Social Sciences.

Visualizing uncertainty and perspectives
Prof. Piek Vossen and Antske Fokkens are co-applicants of the project “Visualizing uncertainty and perspectives”: one of the six proposals that will receive funding to the value of 125K euro. This project aims to develop a tool that visualizes subjectivity, perspective and uncertainty to make them controllable variables in Humanities research. The tool should allow users to compare information from different sources representing alternative perspectives and visualize subjectivity and uncertainty. Such a visualization enables improved and comprehensive source criticism, provides new directions of research and strengthens the methodology of digital humanities.

Hackathon NewsReader Amsterdam: Jan. 21, 2015

Porsches to Pizza – Hack 6,000,000 automotive news articles #NewsReader

NewsReader_Logo

The global automotive industry has a value of the order of $1 trillion annually.

The industry comprises a massive network of suppliers, manufacturers, advertisers, marketeers and journalists. Attracting and supporting the industry is a significant goal of industrial policy.

On January 21st 2015 we’re running an event which should be of interest if :-

  • You’re a data journalist on an automotive desk;
  • You’re an analyst sifting daily news looking for information on your company or on competitors;
  • You’re a data analyst looking to understand how your customers operate their supply chain;
  • You’re an analyst trying to find secondary events that could influence an investment decision

We’ve developed a powerful new tool called ‘NewsReader’ which utilises natural language understanding and semantic web technology. This helps you to better understand the interactions between companies and key individuals, derived from news articles.

We’re processing 6 million news articles from sources around the world both general and specialist media to obtain a searchable database of the news on the automotive sector and you can play with at our Hack event.

Over summer we ran a Hack Day on news surrounding the World Cup. NewsReader enabled the attendees to pull out networks of interactions between politicians, football players and people in FIFA. Not only who they were interacting with but what they were doing.

Early analysis of the automotive data is giving us some interesting insights. For example, news stories between 2005 and 2009 reported that Porsche was buying an ever larger stake in Volkswagen, prompting speculations that Porsche would take over Volkswagen. However, in 2009 the tables turned and Volkswagen took a majority stake and eventually took over Porsche. Our system is able to discriminate between articles mentioning that Volkswagen was taking over Porsche and vice versa rather than simple co-occurrences that are generally found in aggregated news analysis systems. In the slipstream of this take-over, Wendelin Wiedeking, longtime Porsche senior executive, was fired from Porsche. In 2013 he opened the first of a chain of Italian restaurants. As we have a structured database in which similar events can easily be retrieved, it is a small effort to find out that Jürgen Schrempp, former CEO of DaimlerChrysler, also opened a restaurant after retiring from the car industry.

We are running an event in London on January 30th 2105 and if you cannot make the 21st in Amsterdam you may want to join us there.

NewsReader helps you find a needle in a haystack.

REGISTER FOR HACKATHON AMSTERDAM: https://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14504369961

REGISTER FOR HACKATHON LONDON: http://www.eventbrite.com/e/porsches-to-pizza-hack-6000000-automotive-news-articles-newsreader-tickets-14458478699?aff=efbnen

Mini-seminar: Disambiguating entities: Dec. 12 2014

Presentations Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge on December 12, 2014:

Introduction (pdf) by Prof. dr. Piek Vossen
Towards a Dutch FrameNet-style Semantic Role Labeler (pdf) Presentation by Chantal van Son
Named Entity Disambiguation with two-stage coherence optimization (pdf) Presentation by Filip Ilievski

Invitation below:

Mini-seminar: Disambiguating entities and their roles in texts based on background knowledge

Dear all,

We cordially invite you to our mini-seminar “Disambiguating entities and their roles in texts based on background knowledge ” in which we will present our Master’s thesis topics and the current/future work. It will take place on Friday, December 12 from 10:00 to 12:00 in room C-121 (W&N Building).

An array of text processing tools is currently used to extract events, recognize and link entities, and discover relations between the two. We, Filip Ilievski and Chantal van Son, tackle these type of Natural Language Processing tasks by using background knowledge from lexical resources and the Semantic Web. The disambiguation of entities and their context is in the core of both approaches: Filip’s thesis aims to disambiguate them by determining their identity while Chantal’s thesis aims at disambiguating the roles they play in context. You can find the descriptions of both projects below.

Prof. Piek Vossen will kick-off the mini-seminar by depicting the background of the problem and presenting the existing approaches. Prof. Frank van Harmelen will conclude the event with a discussion on the integration of background knowledge in language processing.

Program:
10:00 – 10:15 Introduction by Piek Vossen
10:20 – 10:50 Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
10:55 – 11:25 Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
11:30 – 12:00 Closing remarks and discussion lead by Frank van Harmelen

Towards a Dutch FrameNet-style Semantic Role Labeler (Chantal van Son)
Semantic role labeling (SRL) is one of the key tasks in Natural Language Processing for deep text understanding. Because of its rich and fine-grained categorization of different conceptual scenarios and their specific semantic roles, FrameNet is a popular resource to serve as a basis for SRL systems in English. For Dutch however there is currently no FrameNet-like resource available that can be used to train a SRL system, and creating such a resource usually takes a great deal of expensive manual effort. This study investigates how existing tools and resources, such as the SoNaR Semantic Role Labeler (SSRL) and the Predicate Matrix, can be exploited for FrameNet based SRL in Dutch. In this talk I will present this method while discussing some of its difficulties and possible solutions to solve them.

Named Entity Disambiguation with two-stage coherence optimization (Filip Ilievski)
Contemporary Natural Language Processing modules solve Entity Linking, Event Detection, and Semantic Role Labeling as separate problems. From the semantic point of view, each of these processes adds another brush-stroke onto the canvas of meaning: entities and events are components that occur in relations which correspond to roles. The approach presented here extends such NLP processes with a semantic process of coherence optimization. I use both binary logic and probabilistic models built through manual and automatic techniques. The binary filtering phase relies on restrictions from VerbNet and a domain-specific ontology. The optimization phase aims to maximize the coherence between the remaining candidates in a probabilistic manner based on available background knowledge about the entities.

Kind regards,
Filip Ilievski & Chantal van Son

Release Open Source Dutch WordNet

Open Source Dutch Wordnet is a Dutch lexical semantic database.
OpenSourceDutch_Wordnet
Demo of Open Source Dutch WordNet. Release first version of the Open Dutch Wordnet (ODWN): December 02, 2014. By Marten Postma and Piek Vossen.

ODWN was created by removing the proprietary content from Cornetto (http://www2.let.vu.nl/oz/cltl/cornetto), and by using open source resources to replace this proprietary content.

Open Source Dutch WordNet contains 116,992 synsets, of which 95,356 originate from WordNet 3.0 and 21,636 synsets are new synsets. The number of English synsets without Dutch synonyms is 60,743, which means that 34,613 WordNet 3.0 synsets have been filled with at least one Dutch synonym.

The demo of Open Source Dutch WordNet can be inspected by go through these steps:
(1) Use as browser Google Chrome or Mozilla Firefox
(2) Go to https://debvisdic.let.vu.nl:9002/editor/
(3) login with: username: gast, password: gast
(4) click in the left box on ‘ODWN’ add click the ‘Add’ button
(5) click the button ‘Open dictionaries’ and inspect the resource

This project has been co-funded by the Nederlandse Taalunie (2013-2014).

Summer School Perspectives on Subjectivity: July 06-17, 2015

Piek Vossen and his team organize a Summer School course “Perspectives on Subjectivity”, VU University Amsterdam, the Netherlands from July 06-17, 2015.

Amsterdam_Summer_School
Who should join this course?

Advanced Bachelor/ Master students of Linguistics (theoretical, applied or computational), Artificial Intelligence and Journalism, and others interested in cultural perspectives in communication.

Course content

Welcome to Perpectives on Subjectivity: an excellent course in Amsterdam! That is a very subjective opening, as you will agree. Here the subjective perspective of the writer is obvious, but as a rule it is much more subtle. Subjectivity is one of the key elements of natural language. Every communicative act is subjective to some degree. Subjectivity starts with the intentions of the producer of the message and affects its associated functions and syntactic structures, not to mention the choice of vocabulary and associated connotations. All of this can be summarized as perspective.

This course combines theoretical linguistic notions about perspectives with hands-on work on real language data in the lab. Moving between theory, discussions, practical data annotation and data use (machine learning and quantative/ qualitative analysis), you explore a wide range of linguistic phenomena: reference, modality, attribution, registers, sentiment analysis, opinions, temporal processing and so on.

Perspectives on Subjectivity is provided by the Computational Lexicology & Terminology Lab (CLTL), which models computer understanding of natural language with a central role for sources like lexicons, ontologies and terminology. The CLTL is part of the Department of Language and Communication in the Faculty of humanities at VU University Amsterdam.

Learning objectives

  • You are familiar with linguistic theory on major topics in subjectivity and perspective: reference, modality, attribution, registers, sentiment analysis, opinions and temporal processing.
  • You can evaluate the relevant linguistic theory in critical discussions.
  • You can apply the theory tot the description and analysis of real language data.
  • You can annotate that data and analyse it in lab sessions, using machine learning and quantitative techniques (as used in data journalism, for instance).
  • Round table on ‘Time and Language’: Oct. 30, 2014

    Event date:
    Thursday, 30 October, 2014 – 18:30 to 20:00

    The round table “Time and Language”, organized by Tommaso Caselli (VUA, Amsterdam) and Rachele Sprugnoli (DH-FBK), will be held in Genoa on Thursday October 30 as part of “Festival della Scienza “.

    Time is a pervasive element of human life that is also reflected in the language. But how time is encoded in the various languages ​​of the world? How long is an event? What happens if we want to teach a computer to reconstruct the temporal order of events in a text? The philosopher of language Andrea Bonomi, the linguist Pier Marco Bertinetto and the computational linguist Bernardo Magnini will answer these questions to reveal to the public the role that time has in the language and the challenges of technology in this field. Philosophy, linguistics and technology come together and introduce the public to the fascinating relationship between time and language.

    TiNT: Terminologie in het Nederlandse Taalgebied: Nov. 14, 2014

    141117_TiNT_2014Impression of TiNT 2014, November 14, 2014 and link to programme

    Op 14 november 2014 organiseert de vereniging NL-Term in samenwerking met het Steunpunt Nederlandstalige Terminologie voor de zesde maal de TiNT-dag. TiNT staat voor Terminologie in het Nederlandse Taalgebied. Dit jaarlijks terugkerende evenement is bedoeld om actueel onderzoek en de professionele praktijk op het gebied van Nederlandstalige terminologie voor te stellen voor een breed publiek. TiNT 2014 vindt plaats op het Ministerie van Buitenlandse Zaken in Den Haag, van 9.30 tot 18.00 uur. Het thema is dit jaar Terminologie in de communicatie tussen overheid en burger. We hebben zeer interessante en voor het thema relevante sprekers vast kunnen leggen, waaronder Alex Brenninkmeijer van de Europese Rekenkamer, tot voor kort Nationale Ombudsman, Geert Joris, Algemeen Secretaris van de Nederlandse Taalunie en Jac Brouwer, Landelijk Huisstijlcoördinator van de Belastingdienst.

    We hopen ook dit jaar weer op een volle zaal, interessante presentaties en levendige discussie.

    Meer informatie, het voorlopige programma en het inschrijfformulier kunt u vinden op onze website: http://taalunieversum.org/inhoud/tint-2014.

    Met vriendelijke groet,
    namens het SNT en NL-Term
    Anneleen Schoen
    steunpunt@let.vu.nl

    1st VU-Spinoza workshop: Oct. 17, 2014

    Understanding
    of
    language
    by
    machines

    – an escape from the world of language –

    Spinoza Prize projects (2014-2019)
    Prof. dr. Piek Vossen

    Understanding language by machines
    1st VU-Spinoza workshop

    Friday, October 17, 2014 from 12:30 PM to 6:00 PM (CEST)

    Atrium, room D-146, VU Medical Faculty (1st floor, D-wing)
    Van der Boechorststraat 7
    1081 BT Amsterdam

    Please RSVP via Eventbrite before October 03, 2014

    ULM-1-4_72dpi

    Can machines understand language? According to John Searle, this is fundamentally impossible. He used the Chinese Room thought-experiment to demonstrate that computers follow instructions to manipulate symbols without understanding of these symbols. William van Orman Quine even questioned the understanding of language by humans, since symbols are only grounded through approximation by cultural situational convention. Between these extreme points of views, we are nevertheless communicating every day as part of our social behavior (within Heidegger’s hermeneutic circle), while more and more computers and even robots take part in communication and social interactions.

    The goal of the Spinoza project “Understanding of language by machines” (ULM) is to scratch the surface of this dilemma by developing computer models that can assign deeper meaning to language that approximates human understanding and to use these models to automatically read and understand text. We are building a Reference Machine: a machine that can map natural language to the extra- linguistic world as we perceive it and represent it in our brain.

    This is the first in a series of workshops that we will organize in the Spinoza project to discuss and work on these issues. It marks the kick-off of 4 projects that started in 2014, each studying different aspects of understanding and modeling this through novel computer programs. Every 6-months, we will organize a workshop or event that will bring together different research lines to this central theme and on a shared data sets.

    We investigate ambiguity, variation and vagueness of language; the relation between language, perception and the brain; the role of the world view of the writer of a text and the role of the world view and background knowledge of the reader of a text.

    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Program ▬▬▬▬▬▬▬▬▬▬▬▬▬▬


    12:30 – 13:00 Welcome

    13:00 – 13:30 Understanding language by machines: Piek Vossen
    13:30 – 14:30 Borders of ambiguity: Marten Postma and Ruben Izquierdo
    14:30 – 15:00 Word, concept, perception and brain: Emiel van Miltenburg and Alessandro Lopopolo
    15:00 – 15:15 Coffee break
    15:15 – 15:45 Stories and world views as a key to understanding: Tommaso Caselli and Roser Morante
    15:45 – 16:15 A quantum model of text understanding: Minh Lê Ngọc and Filip Ilievski
    16:15 – 17:00 Discussion on building a shared demonstrator: a reference machine
    17:00 – 18:00 Drinks

    For more information on the project see Understanding of Language by Machines.

    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ RSVP ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

    Admission is free. Please RSVP via Eventbrite before October 03, 2014.

    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Location ▬▬▬▬▬▬▬▬▬▬▬▬▬▬

    Atrium, room D-146, VU Medical Faculty (1st floor, D-wing)
    Van der Boechorststraat 7
    1081 BT Amsterdam
    The Netherlands

    Parking info N.B. Campus parking is temporarily unavailable.

     The_Reference_Machine_600