MultiwordTagger

NafMultiWordTagger is a module that reads the terms in a NAF representation of a text and searches for multiword sequences of terms in the English WordNet. If it finds a multiword sequence, it groups the terms as a single term with the original terms as components.

The run.sh script reads from the input stream (which should be a NAF file) and writes to the outputstream a NAF file with the multiwords represented in the term layer.

NafMultiWordTagger is licensed under GNU GPL v3.

NafMultiWordTagger is compiled on Mac OS X version 10.6.8 with Java 1.6.

The source code can be downloaded through:

https://github.com/cltl/MultiWordTagger

 

Leave a Reply

Your email address will not be published. Required fields are marked *