07 Universal Dependencies
Universal Dependencies (UD) is an internationally harmonised annotation framework that aims to standardize the morphological and syntactic tagging of texts across languages in order to foster the development of multilingual language technologies and contrastive linguistics research. Grounded in dependency grammar principles, UD provides a universal set of grammatical categories – including parts of speech (POS), morphological features and syntactic dependency relations – and guidelines for their use. The framework is flexible, incorporating language-specific tags as needed. To date, over 240 corpora worldwide have been manually tagged using the framework, including the universal treebanks of both written and spoken Slovene.
Introduction to Tags
The Universal Dependencies framework establishes a comprehensive and universal set of tags for parts of speech (POS), morphological features and syntactic dependencies that can be adopted in the treebanks of individual languages, or supplemented with new morphological features or derivations of core relations when necessary. For the Slovene language data, this includes the adoption of all 17 parts of speech (see Table 1), 22 morphological features spanning 62 distinct values (see Table 2), and 35 types of dependency relations (see Table 3).
Tag | Description |
---|---|
ADJ | adjective |
ADP | adposition |
ADV | adverb |
AUX | auxiliary |
CCONJ | coordinating conjunction |
DET | determiner |
INTJ | interjection |
NOUN | noun |
NUM | numeral |
PART | particle |
PRON | pronoun |
PROPN | proper noun |
PUNCT | punctuation |
SCONJ | subordinating conjunction |
SYM | symbol |
VERB | verb |
X | other |
Table 1: Part-of-speech tags used in Slovene texts.
Feature | Value | Description |
---|---|---|
Abbr | Yes | abbreviation |
Animacy | Anim, Inanim | animacy |
Aspect | Imp, Perf | aspect |
Case | Nom, Gen, Dat, Acc, Loc, Ins | case |
Definite | Ind, Def | definiteness or state |
Degree | Pos, Cmp, Sup | degree |
Foreign | Yes | is this a foreign word? |
Gender | Masc, Fem, Neut | gender |
Gender[psor] | Masc. Fem, Neut | possessor’s gender |
Mood | Ind, Imp, Cnd | mood |
Number | Sing, Dual, Plur | number |
Number[psor] | Sing, Dual, Plur | possessor’s number |
NumForm | Word, Digit, Roman | numeral form |
NumType | Card, Ord, Mult, Sets | numeral type |
Person | 1, 2, 3 | person |
Polarity | Neg, Pos | polarity |
Poss | Yes | possessive |
PronType | Prs, Int, Rel, Dem, Tot, Neg, Ind | pronominal type |
Reflex | Yes | reflexive |
Tense | Pres, Fut | tense |
Variant | Bound, Short | alternative form of word |
VerbForm | Fin, Inf, Sup, Part, Conv | form of verb or deverbative |
Tag | Description |
---|---|
acl | clausal modifier of noun |
advcl | adverbial clause modifier |
advmod | adverbial modifier |
amod | adjectival modifier |
appos | appositional modifier |
aux | auxiliary verb |
case | case marking preposition |
cc | coordinating conjunction |
ccomp | clausal complement |
conj | conjunct |
cop | copula verb |
csubj | clausal subject |
dep | unspecified dependency |
det | determiner |
discourse | discourse element |
dislocated | dislocated element |
expl | expletive |
fixed | fixed multi-word expression |
flat | flat multi word-expression |
goeswith | disjointed token |
iobj | indirect object |
list | list |
mark | marker (subordinating conjunction) |
nmod | nominal modifier |
nsubj | nominal subject |
nummod | numeric modifier |
obj | (direct) object |
obl | oblique nominal (adjunct) |
orphan | dependent of missing parent |
parataxis | parataxis |
punct | punctuation symbol |
reparandum | overriden disfluency |
root | root element |
vocative | vocative |
xcomp | open clausal complement |
Annotation Guidelines
This chapter summarizes the annotation guidelines for the Universal Dependencies (UD) morphology and syntax as applied to Slovene texts. The guidelines are arranged from the latest, up-to-date version to the oldest version.
Version 1.3
Project SPOT
DOBROVOLJC, Kaja in TERČON, Luka: 2023. Universal Dependencies: Smernice za označevanje besedil v slovenščini. Različica 1.3. Rezultat projekta Na drevesnici temelječ pristop k raziskavam govorjene slovenščine. [DOCX] [PDF]) - only in Slovene
Version 1.0
Project Development of Slovene in a Digital Environment
DOBROVOLJC, Kaja in TERČON, Luka: 2023. Universal Dependencies: Smernice za označevanje besedil v slovenščini. Rezultat projekta Razvoj slovenščine v digitalnem okolju. [DOCX] [PDF] - only in Slovene
Priloga k smernicam: Odprta vprašanja pri prenosu označevalne sheme Universal Dependencies na slovenska besedila
[DOCX] [PDF] - only in Slovene
References and Links
This chapter compiles relevant references and provides links to projects where the the Universal Dependencies (UD) morphology and syntax have been developed and applied to Slovene texts.
Main website of the Universal Dependencies project: https://universaldependencies.org/
General guidelines: https://universaldependencies.org/guidelines.html
Slovene guidelines (in English): https://universaldependencies.org/sl/index.html
Corpora containing manually revised UD tags
Slovenian-SSJ written language treebank: https://github.com/UniversalDependencies/UD_Slovenian-SSJ
Slovenian-SST spoken language treebank: https://github.com/UniversalDependencies/UD_Slovenian-SST
Slovenian-SSJ treebank as part of the SUK 1.0 training corpus: Zwitter Vitez, Ana; et al., 2023, Spoken corpus Gos 2.0 (transcriptions), Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1771.
References
Dobrovoljc, K., Terčon, L., Ljubešić, N. (2023). Universal Dependencies za slovenščino: nove smernice, ročno označeni podatki in razčlenjevalni model. Slovenščina 2.0, 11(1): 218–246. https://doi.org/10.4312/slo2.0.2023.1.218-246 [PDF] - only in Slovene
Dobrovoljc, K., Terčon, L., & Ljubešić, N. (2022). Universal Dependencies za slovenščino: nadgradnja smernic, učnih podatkov in razčlenjevalnega modela. In D. Fišer & T. Erjavec (Eds.), Jezikovne tehnologije in digitalna humanistika: zbornik konference (pp. 30–39). Inštitut za novejšo zgodovino. https://nl.ijs.si/jtdh22/pdf/JTDH2022_Dobrovoljc-et-al_Universal-Dependencies-za-slovenscino.pdf
de Marneffe, M.-C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2), 255–308. https://doi.org/10.1162/coli_a_00402
Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., & Zeman, D. (2020). Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference, 4034–4043. https://aclanthology.org/2020.lrec-1.497
Dobrovoljc, K., Erjavec, T., & Krek, S. (2017). The Universal Dependencies Treebank for Slovenian. Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 33–38. https://doi.org/10.18653/v1/W17-1406
Dobrovoljc, K., & Nivre, J. (2016). The Universal Dependencies Treebank of Spoken Slovenian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 1566–1573. https://aclanthology.org/L16-1248