07 Universal Dependencies Universal Dependencies (UD) is an internationally harmonised annotation framework that aims to standardize the morphological and syntactic tagging of texts across languages in order to foster the development of multilingual language technologies and contrastive linguistics research. Grounded in dependency grammar principles, UD provides a universal set of grammatical categories – including parts of speech (POS), morphological features and syntactic dependency relations – and guidelines for their use. The framework is flexible, incorporating language-specific tags as needed. To date, over 240 corpora worldwide have been manually tagged using the framework, including the universal treebanks of both written and spoken Slovene. Introduction to Tags The Universal Dependencies framework establishes a comprehensive and universal set of tags for parts of speech (POS), morphological features and syntactic dependencies that can be adopted in the treebanks of individual languages, or supplemented with new morphological features or derivations of core relations when necessary. For the Slovene language data, this includes the adoption of all 17 parts of speech (see Table 1), 22 morphological features spanning 62 distinct values (see Table 2), and 35 types of dependency relations (see Table 3). Tag Description ADJ adjective ADP adposition ADV adverb AUX auxiliary CCONJ coordinating conjunction DET determiner INTJ interjection NOUN noun NUM numeral PART particle PRON pronoun PROPN proper noun PUNCT punctuation SCONJ subordinating conjunction SYM symbol VERB verb X other Table 1: Part-of-speech tags used in Slovene texts. Feature Value Description Abbr  Yes abbreviation Animacy  Anim, Inanim animacy Aspect  Imp, Perf aspect Case  Nom, Gen, Dat, Acc, Loc, Ins case Definite  Ind, Def definiteness or state Degree  Pos, Cmp, Sup degree Foreign  Yes is this a foreign word? Gender  Masc, Fem, Neut gender Gender[psor]  Masc. Fem, Neut possessor’s gender Mood  Ind, Imp, Cnd mood Number  Sing, Dual, Plur number Number[psor]  Sing, Dual, Plur possessor’s number NumForm  Word, Digit, Roman numeral form NumType  Card, Ord, Mult, Sets numeral type Person  1, 2, 3 person Polarity  Neg, Pos polarity Poss  Yes possessive PronType  Prs, Int, Rel, Dem, Tot, Neg, Ind pronominal type Reflex  Yes reflexive Tense  Pres, Fut tense Variant  Bound, Short alternative form of word VerbForm  Fin, Inf, Sup, Part, Conv form of verb or deverbative Table 2: Tags for morphological features used in Slovene texts. In the corpus, these are listed in the form of feature and value pairs (e.g., Tense=Pres). Tag Description acl clausal modifier of noun advcl adverbial clause modifier advmod adverbial modifier amod adjectival modifier appos appositional modifier aux auxiliary verb case case marking preposition cc coordinating conjunction ccomp clausal complement conj conjunct cop copula verb csubj clausal subject dep unspecified dependency det determiner discourse discourse element dislocated dislocated element expl expletive fixed fixed multi-word expression flat flat multi word-expression goeswith disjointed token iobj indirect object list list mark marker (subordinating conjunction) nmod nominal modifier nsubj nominal subject nummod numeric modifier obj (direct) object obl oblique nominal (adjunct) orphan dependent of missing parent parataxis parataxis punct punctuation symbol reparandum overriden disfluency root root element vocative vocative xcomp open clausal complement Table 3: Tags for syntactic dependency relations (without subtypes) used in Slovene texts. Annotation Guidelines This chapter summarizes the annotation guidelines for the Universal Dependencies (UD) morphology and syntax as applied to Slovene texts. The guidelines are arranged from the latest, up-to-date version to the oldest version. Version 1.7 Project SPOT DOBROVOLJC, Kaja in TERČON, Luka: 2024. Universal Dependencies: Smernice za označevanje besedil v slovenščini. Različica 1.7. Rezultat projekta Na drevesnici temelječ pristop k raziskavam govorjene slovenščine. [DOCX] [PDF] - only in Slovene Version 1.3 Project SPOT DOBROVOLJC, Kaja in TERČON, Luka: 2023. Universal Dependencies: Smernice za označevanje besedil v slovenščini. Različica 1.3. Rezultat projekta Na drevesnici temelječ pristop k raziskavam govorjene slovenščine. [DOCX] [PDF] ) - only in Slovene Version 1.0 Project Development of Slovene in a Digital Environment DOBROVOLJC, Kaja in TERČON, Luka: 2023. Universal Dependencies: Smernice za označevanje besedil v slovenščini. Rezultat projekta Razvoj slovenščine v digitalnem okolju. [DOCX] [PDF] - only in Slovene Priloga k smernicam: Odprta vprašanja pri prenosu označevalne sheme Universal Dependencies na slovenska besedila [DOCX] [PDF] - only in Slovene References and Links This chapter compiles relevant references and provides links to projects where the the Universal Dependencies (UD) morphology and syntax have been developed and applied to Slovene texts. Main website of the Universal Dependencies project: https://universaldependencies.org/ General guidelines: https://universaldependencies.org/guidelines.html Slovenian-specific guidelines (in English): https://universaldependencies.org/sl/index.html Platform for discussing and proposing improvements to Slovenian-specific guidelines (in English): https://github.com/UniversalDependencies/UD_Slovenian-SSJ/issues Corpora containing manually revised UD tags Slovenian-SSJ written language treebank: https://github.com/UniversalDependencies/UD_Slovenian-SSJ Slovenian-SST spoken language treebank: https://github.com/UniversalDependencies/UD_Slovenian-SST Slovenian-SSJ treebank as part of the latest SUK 1.1 training corpus: Arhar Holdt, Špela; et al., 2024, Training corpus SUK 1.1, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1959 . References Dobrovoljc, K. (2024). Extending the Spoken Slovenian Treebank. Conference on Language Technologies and Digital Humanities (JT-DH-2024), Ljubljana, Slovenia. https://doi.org/10.5281/zenodo.13936394 Dobrovoljc, K., Terčon, L., Ljubešić, N. (2023). Universal Dependencies za slovenščino: nove smernice, ročno označeni podatki in razčlenjevalni model. Slovenščina 2.0, 11(1): 218–246. https://doi.org/10.4312/slo2.0.2023.1.218-246 [PDF] - only in Slovene Dobrovoljc, K., Terčon, L., & Ljubešić, N. (2022). Universal Dependencies za slovenščino: nadgradnja smernic, učnih podatkov in razčlenjevalnega modela. In D. Fišer & T. Erjavec (Eds.), Jezikovne tehnologije in digitalna humanistika: zbornik konference (pp. 30–39). Inštitut za novejšo zgodovino. https://nl.ijs.si/jtdh22/pdf/JTDH2022_Dobrovoljc-et-al_Universal-Dependencies-za-slovenscino.pdf de Marneffe, M.-C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2), 255–308. https://doi.org/10.1162/coli_a_00402 Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., & Zeman, D. (2020). Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference, 4034–4043. https://aclanthology.org/2020.lrec-1.497 Dobrovoljc, K., Erjavec, T., & Krek, S. (2017). The Universal Dependencies Treebank for Slovenian. Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 33–38. https://doi.org/10.18653/v1/W17-1406 Dobrovoljc, K., & Nivre, J. (2016). The Universal Dependencies Treebank of Spoken Slovenian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 1566–1573. https://aclanthology.org/L16-1248