Skip to main content

05 Lemmatization

When tagging text, each word form is assigned a lemma (the base form of the word), facilitating further processing in a unified way. The lemmatization system was developed in the project JOS: Linguistic Annotation of Slovene (Holozan et al. 2008) and follows the MULTEXT-East v4 or JOS morphosyntactic system in determining parts of speech, capitalization and some other features.