Corpora and Lexica

Lexicon or Corpus name (Main) Developer(s) Description License Repository Associated Paper(s)
Annotating negations van Miltenburg, Morante, Elliott Image descriptions annotated with negation & negation type Descriptions are CC-licensed, taken from Flickr30K corpus, which is based on Flickr data. GitHub Pragmatic Factors in Image Description: The Case of Negations
BiographyNet Enriched Biographies Corpus Fokkens 130.000 biographies that are enriched automatically, represented in NAF and RDF… RDF: creative commons. Text and manual annotations: variety of licenses… to appear to appear
The Circumstantial Event Ontology (CEO) Segers The Circumstantial Event Ontology is a manually constructed OWL ontology that models… CC-BY-SA to appear The Circumstantial Event Ontology (CEO)
Cornetto Maks Free for academic use Cornetto, demo
Dutch FrameNet Maks
DutchSemCor Vossen A one-million word Dutch corpus that is fully sense-tagged with senses and domain tags… DutchSemCor DutchSemCor: building a semantically annotated corpus for Dutch
The ECB+ Corpus Cybulska, Vossen The ECB+ corpus is an extension to the EventCorefBank (ECB, Bejan and Harabagiu, 2010)… NewsReader ▷ Results ▷ Data Using a sledgehammer to crack a nut?
ECB+-CEO Segers ECB-CEO is an extension of the ECB+ corpus where the logical relation between… to appear The Circumstantial Event Ontology (CEO)
The Event and Implied Situation Ontology (ESO) Segers ESO is a manually constructed OWL-2 ontology which formalizes the pre-, … CC-BY-SA GitHub NewsReader ▷ ESO The Event and Implied Situation Ontology
ESO-FN-WN Mappings Segers ESO-FN-WN Mappings is a mapping file between ESO classes and Framenet frames… CC-BY-SA GitHub NewsReader ▷ ESO The Event and Implied Situation Ontology
The Gun Violence Corpus (GVC) Vossen, Postma, Ilievski, Segers GVC contains event coreference annotation for 510 documents from the gun violence domain. It was created following our data-to-text method, mostly as part of the development of… CC-BY-SA GitHub Don’t Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
MEANTIME-ESO Corpus Segers The MEANTIME-ESO Corpus is developed for the evaluation of the ESO Ontology. For this, 120 articles… CC-BY-SA GitHub NewsReader ▷ ESO The Event and Implied Situation Ontology
Event StoryLine Corpus (ESC) Caselli The Event StoryLIne Corpus (ESC) is a manually annotated corpus of documents extracted from the ECB+… CC-BY-SA v0.9 GitHub CLTL ▷ EventStoryLine The Event StoryLine Corpus
Open Dutch WordNet Postma Open Dutch WordNet is a Dutch lexical semantic database. It was created by … CC BY-SA 4.0 Open Dutch WordNet Open Dutch WordNet
Referentiebestand Nederland (RBN) Maks, Martin, van der Vliet 50,000 frequent Dutch words annotated with linguistic information Incorporated in Cornetto INL/Lexica. RBN Online
SemEval Long Tail QA Task Ilievski, Postma We propose a ‘referential quantification’ task that requires systems to establish the meaning… to appear
Stereotypes Fokkens Collection of small descriptions automatically extracted from text (in csv). It comes with… Texts are under copy-right. Annotations… to appear
The Vaccination Corpus Morante, van Son This dataset contains online documents around the topic of vaccinations. The set contains news articles… to appear
The VU sound corpus van Miltenburg, Timmermans, Aroyo Collection of crowd-sourced annotations for the Freesound database Sounds are CC-licenced GitHub The VU Sound Corpus: Adding…