# 12 Slovene learner corpus KOST

# Introduction to Tags

This chapter summarises the KOST tags. A more detailed presentation can be found in the guidelines in the Annotation Guidelines chapter.

<table id="bkmrk-tag-linguistic-level"><thead><tr><th>Tag</th><th>Linguistic level</th><th>Type of correction/part of speech</th></tr></thead><tbody><tr><td>Z-LOC</td><td>orthography</td><td>punctuation</td></tr><tr><td>Z-CRK</td><td>orthography</td><td>spelling</td></tr><tr><td>Z-SN</td><td>orthography</td><td>joined or divided words</td></tr><tr><td>Z-MV</td><td>orthography</td><td>capitalization</td></tr><tr><td>Z-KR</td><td>orthography</td><td>abbreviation</td></tr><tr><td>B-SAM</td><td>vocabulary</td><td>noun</td></tr><tr><td>B-GLAG</td><td>vocabulary</td><td>verb</td></tr><tr><td>B-ZAIM</td><td>vocabulary</td><td>pronoun</td></tr><tr><td>B-PRID</td><td>vocabulary</td><td>adjective</td></tr><tr><td>B-PRISL</td><td>vocabulary</td><td>adverb</td></tr><tr><td>B-PRED</td><td>vocabulary</td><td>preposition</td></tr><tr><td>B-VEZ</td><td>vocabulary</td><td>conjunction</td></tr><tr><td>B-OST</td><td>vocabulary</td><td>other</td></tr><tr><td>O-SAM</td><td>word form</td><td>noun</td></tr><tr><td>O-GLAG</td><td>word form</td><td>verb</td></tr><tr><td>O-ZAIM</td><td>word form</td><td>pronoun</td></tr><tr><td>O-PRID</td><td>word form</td><td>adjective</td></tr><tr><td>O-PRISL</td><td>word form</td><td>adverb</td></tr><tr><td>O-OST</td><td>word form</td><td>other</td></tr><tr><td>S-STR</td><td>syntax</td><td>structure</td></tr><tr><td>S-BR</td><td>syntax</td><td>word order</td></tr><tr><td>S-IZP</td><td>syntax</td><td>omission</td></tr><tr><td>S-ODV</td><td>syntax</td><td>insertion</td></tr><tr><td>POV</td><td>/</td><td>related correction</td></tr><tr><td>\[???\]</td><td>/</td><td>incomprehensible, unclear correction</td></tr></tbody></table>

# Annotation Guidelines

This chapter summarizes the annotation guidelines for semantic-role labelling as applied to Slovene texts. The guidelines are arranged from the latest, up-to-date version to the oldest version.

**Version 1.0 (04-2022)  
Project [Development of Slovene in a Digital Environment](https://rsdo.slovenscina.eu/en)**  
STRITAR KUČUK, Mojca, 2023: *KOST 1.0: Priročnik za označevanje napak, delovna verzija*. Različica 1.0. [\[PDF\]](https://www.cjvt.si/korpus-kost/wp-content/uploads/sites/24/2022/04/Prirocnik-za-oznacevanje-napak-v-KOST-u-2022-04-13.pdf) - only in Slovene

# References and Links

This chapter compiles relevant references and provides links to projects where the KOST system has been developed and applied to Slovene texts.

**Projects, in which the system has been developed:** [Development of Slovene in a Digital Environment](https://rsdo.slovenscina.eu/en)

**Corpora containing manually revised KOST tags:** STRITAR KUČUK, Mojca, ŠTER, Helena, PISEK, Staša, PETRIC LASNIK, Ivana, KETE MATIČIČ, Jana, PIRIH SVETINA, Nataša, PREGLAU, Daniela, ARHAR HOLDT, Špela, KRSNIK, Luka, ERJAVEC, Tomaž, 2023, *Slovene learner corpus KOST 1.0,* Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, [http://hdl.handle.net/11356/1753](http://hdl.handle.net/11356/1753).

**The CJVT Svala tool for manual annotation following the KOST system:** ARHAR HOLDT, Špela, KOSEM, Iztok, STRITAR KUČUK, Mojca, KRSNIK, Luka, JOVAN, Leon Noe, 2022: *CJVT Svala* (Kazalnik projekta Razvoj slovenščine v digitalnem okolju), v1.0, [https://orodja.cjvt.si/svala/](https://orodja.cjvt.si/svala/), dostop 2. 3. 2023.

**References:** STRITAR KUČUK, Mojca, 2022: *KOST med korpusi usvajanja tujega jezika*. Obdobja 41: Na stičišču svetov: slovenščina kot drugi in tuji jezik. 323–334. [https://centerslo.si/wp-content/uploads/2022/11/Stritar-Kucuk\_Obdobja-41.pdf](https://centerslo.si/wp-content/uploads/2022/11/Stritar-Kucuk_Obdobja-41.pdf)

ARHAR HOLDT, Špela, KOSEM, Iztok, STRITAR KUČUK, Mojca, 2022: *Metode in orodja za lažjo pripravo korpusov usvajanja jezika*. Obdobja 41: Na stičišču svetov: slovenščina kot drugi in tuji jezik. 23–30. [https://centerslo.si/wp-content/uploads/2022/11/Arhar-Holdt-et-al\_Obdobja-41.pdf](https://centerslo.si/wp-content/uploads/2022/11/Arhar-Holdt-et-al_Obdobja-41.pdf)

STRITAR KUČUK, Mojca, 2020: *Modul Leto plus – prvi korak do korpusa slovenščine kot tujega jezika*. Zbornik konference Jezikovne tehnologije in digitalna humanistika 2020. 131–135. [http://nl.ijs.si/jtdh20/pdf/JT-DH\_2020\_StritarKucuk\_Modul-Leto-plus%e2%80%93prvi-korak-do-korpusa-slovenscine-kot-tujega-jezika.pdf](http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_StritarKucuk_Modul-Leto-plus%e2%80%93prvi-korak-do-korpusa-slovenscine-kot-tujega-jezika.pdf)