11 Developmental corpus Šolar

The Šolar annotation system, developed alongside the Slovene Šolar developmental corpus (Arhar Holdt et al. 2022), is designed for categorizing language corrections in texts written by pupils in Slovene primary schools and students in Slovene secondary schools. The system's initial framework for annotating corrections was established in the corpus's first edition (Kosem et al. 2012), and significantly enhanced in its 2.0 version (Kosem et al. 2016), which also saw the initial development of its annotation guidelines (Arhar Holdt et al. 2018). The system is structured hierarchically into three levels: it starts by identifying corrections at the linguistic level, then classifies the general type of correction, and finally pinpoints the specific linguistic issue. This three-tiered tagging approach ensures both robust and nuanced application across various contexts.

Introduction to Tags

This chapter summarises the Šolar tags. A more detailed presentation can be found in the guidelines in the Annotation Guidelines chapter.

Tag Linguistic level Category of correction Specific language problem
Č/VOK/odveč Spelling Vowels Superfluous vowel
Č/VOK/izpust Spelling Vowels Omitted vowel
Č/VOK/menjava-ao Spelling Vowels AO substitution
Č/VOK/menjava-ei Spelling Vowels EI substitution
Č/VOK/menjava-uo Spelling Vowels UO substitution
Č/VOK/menjava-drugo Spelling Vowels Other vowel substitutions
Č/KONZ/odveč Spelling Consonants Superfluous consonant
Č/KONZ/izpust Spelling Consonants Omitted consonant
Č/KONZ/menjava-sz Spelling Consonants Substitution SZ
Č/KONZ/menjava-td Spelling Consonants Substitution TD
Č/KONZ/menjava-kgh Spelling Consonants Substitution KGH
Č/KONZ/menjava-mn Spelling Consonants Substitution MN
Č/KONZ/menjava-šž Spelling Consonants Substitution ŠŽ
Č/KONZ/menjava-strešice Spelling Consonants Substitution of DIACRITIC
Č/KONZ/menjava-drugo Spelling Consonants Other consonant substitution
Č/W/začetek Spelling Labio-velar approximant w Word-initially
Č/W/sredina Spelling Labio-velar approximant w Word-medial 
Č/W/konec Spelling Labio-velar approximant w Word-final
Č/W/v Spelling Labio-velar approximant w Prepositional V
Č/SKLOP/zlog Spelling Letter clusters A syllable is missing or is superfluous
Č/SKLOP/lj Spelling Letter clusters Cluster LJ 
Č/SKLOP/nj Spelling Letter clusters Cluster NJ
Č/SKLOP/ij Spelling Letter clusters Cluster IJ
Č/SKLOP/podvojene Spelling Letter clusters Doubled letters
Č/SKLOP/premet Spelling Letter clusters Metathesis
Č/PRED/sz Spelling Variable (allophonic) prepositions Preposition s/z
Č/PRED/kh Spelling Variable (allophonic) prepositions Preposition k/h 
O/KAT/sklon-rt Morphology Categorical corrections Case: genitive-accusative
O/KAT/sklon-dm Morphology Categorical corrections Case: dative-locative 
O/KAT/sklon-mo Morphology Categorical corrections Case: locative-instrumental
O/KAT/sklon-drugo Morphology Categorical corrections Other case substitutions
O/KAT/število-em Morphology Categorical corrections Number: singular-plural
O/KAT/število-dm Morphology Categorical corrections Number: dual-plural
O/KAT/število-ed Morphology Categorical corrections Number: single-dual
O/KAT/spol Morphology Categorical corrections Gender
O/KAT/vid Morphology Categorical corrections Aspect
O/KAT/čas Morphology Categorical corrections Tense
O/KAT/oseba Morphology Categorical corrections Person
O/KAT/nedoločnik-kratki Morphology Categorical corrections Reduced infinitive
O/KAT/nedoločnik-namenilnik Morphology Categorical corrections Infinitive and supine
O/KAT/nedoločnik-osebna Morphology Categorical corrections Infinitive and finite verb
O/KAT/povratnost Morphology Categorical corrections Reflexivity
O/KAT/naklon Morphology Categorical corrections Mood
O/KAT/način Morphology Categorical corrections Voice
O/KAT/oblika-zaimka Morphology Categorical corrections Pronomial form
O/KAT/določnost Morphology Categorical corrections Definiteness
O/KAT/stopnjevanje Morphology Categorical corrections Comparison
O/PAR/glagolska-osnova Morphology Paradigmatic Corrections Verbal root
O/PAR/glagolska-končnica Morphology Paradigmatic Corrections Verbal ending
O/PAR/neglagolska-osnova Morphology Paradigmatic Corrections Non-verbal root
O/PAR/neglagolska-končnica Morphology Paradigmatic Corrections Non-verbal ending 
O/PAR/neobstojni-vokal Morphology Paradigmatic Corrections Epenthetic vowel
O/PAR/preglas-in-cč Morphology Paradigmatic Corrections Umlaut and cč
O/DOD/variante Morphology Additional Annotation Morphological variants
O/DOD/besede-mati-hči Morphology Additional Annotation Mati-hči
O/DOD/besede-otrok Morphology Additional Annotation Otrok
B/SAM/napačno-lastno Vocabulary Nouns Erroneous proper noun
B/SAM/lastno-občno Vocabulary Nouns Proper and common name
B/SAM/občno-besedišče Vocabulary Nouns Common vocabulary
B/GLAG/predpona Vocabulary Verbs Verbal prefixes
B/GLAG/moči-morati Vocabulary Verbs Substitution moči-morati 
B/GLAG/naklonski Vocabulary Verbs Other substitutions of modal verbs
B/GLAG/drugo Vocabulary Verbs Other substitutions of verbs
B/ZAIM/povratna-svojilnost Vocabulary Pronoun Reflexive possessive
B/ZAIM/ki-kateri Vocabulary Pronoun Substitution ki -- kateri
B/ZAIM/oziralni Vocabulary Pronoun Other problems with relative pronouns
B/ZAIM/noben Vocabulary Pronoun Substitution of negative pronouns
B/ZAIM/drugo Vocabulary Pronoun Other pronomial substitutions
B/PRED/glagolske-zveze Vocabulary Preposition Prepositions in verbal phrases
B/PRED/neglagolske-zveze Vocabulary Preposition Prepositions into non-verbal phrases
B/PRED/lokacijske-dvojnice Vocabulary Preposition Locative doublets
B/PRED/drugo Vocabulary Preposition Other substitutions of prepositions
B/VEZ/in-pa-ter Vocabulary Conjunction Substitutions of in-pa-ter
B/VEZ/protivni Vocabulary Conjunction Coordinating adversative conjunction
B/VEZ/sprememba-odnosa Vocabulary Conjunction Change to subordination
B/VEZ/drugo Vocabulary Conjunction Other substitutions of conjunctions
B/PRID/drugo Vocabulary Adjective All problems related to adjectives
B/PRISL/drugo Vocabulary Adverb All problems related to adverbs
B/OST/drugo Vocabulary Other parts of speech All problems related to other parts of speech
B/MEN/polnopomenska-v-zaimek Vocabulary Substitutions beyond the confines of part of speech Lexical word or phrase changed into pronoun
B/MEN/zaimek-v-polnopomensko Vocabulary Substitutions beyond the confines of part of speech Pronoun to a lexical word or phrase
B/MEN/veznik-zaimek Vocabulary Substitutions beyond the confines of part of speech Substitution of conjunction and pronoun
B/MEN/besedna-družina Vocabulary Substitutions beyond the confines of part of speech Word family 
B/MEN/samostalnik-bz Vocabulary Substitutions beyond the confines of part of speech Noun and phrase
B/MEN/glagol-bz Vocabulary Substitutions beyond the confines of part of speech Verb and phrase
B/MEN/prislov-pridevnik-bz Vocabulary Substitutions beyond the confines of part of speech Adverb/adjective and phrase
B/MEN/drugo Vocabulary Substitutions beyond the confines of part of speech Other types of substitutions
B/DOD/zaznamovano Vocabulary Substitutions beyond the confines of part of speech Stylistically marked vocabulary
S/BR/povedek-osebek Syntax Word order Constituent order: predicate-subject
S/BR/povedek-predmet Syntax Word order Constituent order: predicate-object
S/BR/povedek-prislovno-določilo Syntax Word order Order: sentence-adverbial determiner
S/BR/členek Syntax Word order Order: particle
S/BR/znotraj-stavčnega-člena Syntax Word order Order within clausal constituents
S/BR/naslonski-niz-znotraj Syntax Word order Clitic string: order of clitics
S/BR/naslonski-niz-prirednost-podrednost Syntax Word order Clitic string: independent-subordinate
S/BR/drugo Syntax Word order Other changes to word order
S/IZPUST/samostalnik-občno-ime Syntax Omitted constituents Noun: common noun
S/IZPUST/samostalnik-lastno-ime Syntax Omitted constituents Noun: proper noun
S/IZPUST/glagol-biti Syntax Omitted constituents The verb biti
S/IZPUST/glagol-drugo Syntax Omitted constituents Other omitted verbs
S/IZPUST/veznik-pa Syntax Omitted constituents The word pa
S/IZPUST/veznik-drugo Syntax Omitted constituents Other omitted conjunctions
S/IZPUST/predlog-ponovljen Syntax Omitted constituents Repeated prepositions
S/IZPUST/predlog-drugo Syntax Omitted constituents Other omitted prepositions
S/IZPUST/zaimek-osebni Syntax Omitted constituents Personal pronoun
S/IZPUST/zaimek-drugo Syntax Omitted constituents Other omitted pronouns
S/IZPUST/pridevnik Syntax Omitted constituents Adjective
S/IZPUST/prislov Syntax Omitted constituents Adverb
S/IZPUST/členek Syntax Omitted constituents Particle
S/IZPUST/stavek Syntax Omitted constituents Sentence
S/ODVEČ/ponavljanje Syntax Superfluous constituents Literal repetition
S/ODVEČ/samostalnik-občno-ime Syntax Superfluous constituents Noun: common noun
S/ODVEČ/samostalnik-lastno-ime Syntax Superfluous constituents Noun: proper noun
S/ODVEČ/glagol-biti Syntax Superfluous constituents The verb biti
S/ODVEČ/glagol-drugo Syntax Superfluous constituents Other superfluous verb
S/ODVEČ/veznik-pa-vezniki Syntax Superfluous constituents The word pa with another conjunction
S/ODVEČ/veznik-pa-drugo Syntax Superfluous constituents Other examples including the word pa
S/ODVEČ/veznik-začetek Syntax Superfluous constituents Conjunction at the beginning of a sentence
S/ODVEČ/veznik-dvojni Syntax Superfluous constituents Doubled conjunction
S/ODVEČ/veznik-drugo Syntax Superfluous constituents Other superfluous conjunction
S/ODVEČ/predlog Syntax Superfluous constituents Preposition
S/ODVEČ/zaimek-osebni Syntax Superfluous constituents Personal pronoun
S/ODVEČ/zaimek-kazalni Syntax Superfluous constituents Demonstrative pronoun
S/ODVEČ/zaimek-svojilni Syntax Superfluous constituents Possessive pronoun
S/ODVEČ/zaimek-drugo Syntax Superfluous constituents Other superfluous pronouns
S/ODVEČ/pridevnik Syntax Superfluous constituents Adjective
S/ODVEČ/prislov-mera Syntax Superfluous constituents Adverb of degree 
S/ODVEČ/prislov-drugo Syntax Superfluous constituents Other superfluous adverbs
S/ODVEČ/členek Syntax Superfluous constituents Particle
S/ODVEČ/stavek Syntax Superfluous constituents Clause
S/ODVEČ/poved Syntax Superfluous constituents Sentence
S/STR/svojina-od Syntax Structure Possessives with od
S/STR/svojina-rodilnik Syntax Structure Possessives with the genitive
S/STR/ločilo-veznik Syntax Structure Substitution punctuation-conjunction
S/STR/združevanje-stavkov Syntax Structure Merged clauses
S/STR/deljenje-stavkov Syntax Structure Separation of clauses/sentences
S/STR/besedna-zveza-stavek Syntax Structure Word/phrase instead of clause and vice versa 
S/STR/preoblikovanje-stavka Syntax Structure Reworked clause
S/DOD/pleonazem Syntax Additional Annotation Pleonasm
S/DOD/vsebina-drugo Syntax Additional Annotation Superfluous content
S/DOD/vsebina-napake Syntax Additional Annotation Erroneous content
S/DOD/pomensko-prazni Syntax Additional Annotation Semantically null
Z/MV/pridevnik-ski Orthography Capital/lowercase letters Adjectives ending in -ski 
Z/MV/pridevnik-drugo Orthography Capital/lowercase letters Other adjectives
Z/MV/občna-imena Orthography Capital/lowercase letters Common noun
Z/MV/osebna-imena Orthography Capital/lowercase letters Personal name with lowercase letter
Z/MV/narodnost Orthography Capital/lowercase letters Nationality with lowercase letter
Z/MV/zemljepisna-imena Orthography Capital/lowercase letters Geographical name with lowercase letter
Z/MV/stvarna-imena Orthography Capital/lowercase letters Proper nouns with lowercase letter
Z/MV/premi-govor Orthography Capital/lowercase letters Direct speech
Z/MV/začetek-povedi Orthography Capital/lowercase letters Sentence initial
Z/MV/hiperkorekcija-ločila Orthography Capital/lowercase letters Hypercorrection following a period
Z/MV/drugo Orthography Capital/lowercase letters Other problems with initial letters
Z/SN/skupaj-glagol Orthography Together/separate Verb together
Z/SN/skupaj-predlog Orthography Together/separate Preposition together
Z/SN/narazen-predlog Orthography Together/separate Preposition separate
Z/SN/skupaj-prislov Orthography Together/separate Adverb together
Z/SN/narazen-prislov Orthography Together/separate Adverb separate
Z/SN/narazen-pridevnik Orthography Together/separate Adjective separate
Z/SN/narazen-drugo Orthography Together/separate Other separate
Z/SN/skupaj-drugo Orthography Together/separate Other together
Z/KR/drugo Orthography Abbreviations All problems related to abbreviations
Z/ŠTEV/drugo Orthography Numbers All problems related to numbers
Z/LOČ/nerazvrščeno Orthography Punctuation Unclassified punctuation corrections
Z/LOČ/vzorec-vejica-stavki Orthography Punctuation Comma before subordinate clauses
Z/LOČ/vzorec-vejica-stavčni-členi Orthography Punctuation Comma between parts-of-speech
Z/LOČ/vzorec-vejica-vezniki Orthography Punctuation Comma and multi-word conjunctions
Z/LOČ/vzorec-vejica-kot Orthography Punctuation Comma and comparative structures
Z/LOČ/vzorec-vejica-pristavki Orthography Punctuation Comma and appositions etc.
Z/LOČ/vzorec-vejica-vrinjen-odvisnik Orthography Punctuation Comma and inserted subordinate clauses
Z/LOČ/vzorec-vejica-priredja-zvez Orthography Punctuation Comma and coordinate phrases
Z/LOČ/vzorec-vejica-priredja-odvisnikov Orthography Punctuation Comma and coordinate clauses
Z/LOČ/vzorec-vejica-pridevniški-niz Orthography Punctuation Comma in adjective strings
Z/LOČ/vzorec-vejica-elipsa-povedka Orthography Punctuation Comma and predicate ellipsis
Z/LOČ/vzorec-vejica-kopičenje-ločil Orthography Punctuation Comma and punctuation accumulation
Z/LOČ/vzorec-vejica-kopičenje-veznikov Orthography Punctuation Comma and conjunction accumulation
Z/LOČ/vzorec-vejica-navajanje Orthography Punctuation Comma and quotation
P/OBL/drugo Related corrections Related morphology corrections All corrections related to morphology
P/SKLA/osebek Related corrections Related syntax corrections Corrections of subject
P/SKLA/drugo Related corrections Related syntax corrections Other corrections related to syntax
P/ZAP/mala-velika Related corrections Related orthography corrections Corrections of initial letter
N//nečitljivo Illegible and dubious examples Illegible examples
N//preveri Illegible and dubious examples Dubious examples

Annotation Guidelines

This chapter summarizes the annotation guidelines for semantic-role labelling as applied to Slovene texts. The guidelines are arranged from the latest, up-to-date version to the oldest version.

Version 1.2 (22/11/2023)
Project Empirical foundations for digitally-supported development of writing skills

ARHAR HOLDT, Špela, LAVRIČ, Polona, ROBLEK, Rebeka, GOLI, Teja, BON, Mija, 2023: Categorizing Teachers’ Corrections: Guidelines for Annotating the Šolar Corpus. Version 1.2. Prepared in the project Empirical foundations for digitally-supported development of writing skills. [DOCX] [PDF]

Version 1.1 (12/8/2022)
Project Development of Slovene in a Digital Environment

ARHAR HOLDT, Špela, LAVRIČ, Polona, ROBLEK, Rebeka, GOLI, Teja, 2022: Categorizing Teachers’ Corrections: Guidelines for Annotating the Šolar Corpus. Version 1.1. Prepared in the project Development of Slovene in a Digital Environment. [DOCX] [PDF]

Version 1.0 (16/12/2018)
Project Upgrade of Šolar Corpus

ARHAR HOLDT, Špela, LAVRIČ, Polona, ROBLEK, Rebeka, GOLI, Teja, 2018: Kategorizacija učiteljskih popravkov: Smernice za označevanje korpusa Šolar 2.0. Različica 1.0. Rezultat projekta Nagradnja korpusa Šolar. [PDF] - only in Slovene

References and Links

This chapter compiles relevant references and provides links to projects where the Šolar system has been developed and applied to Slovene texts.

Projects, in which the system has been developed:
Communication in Slovene
Upgrade of Šolar corpus
Development of Slovene in a Digital Environment
Empirical foundations for digitally-supported development of writing skills

Corpora containing manually revised Šolar tags:
ARHAR HOLDT, Špela, ROZMAN, Tadeja, STRITAR KUČUK, Mojca, KREK, Simon, KRAPŠ VODOPIVEC, Irena, STABEJ, Marko, PORI, Eva, GOLI, Teja, LAVRIČ, Polona, LASKOWSKI, Cyprian Adam, KOCJANČIČ, Polonca, KLEMENC, Bojan, KRSNIK, Luka, KOSEM, Iztok, 2022, Developmental corpus Šolar 3.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1589.

The CJVT Svala tool for manual annotation following the Šolar system:
ARHAR HOLDT, Špela, KOSEM, Iztok, STRITAR KUČUK, Mojca, KRSNIK, Luka, JOVAN, Leon Noe, 2022: CJVT Svala (Kazalnik projekta Razvoj slovenščine v digitalnem okolju), v1.0, https://orodja.cjvt.si/svala/, Accessed on 2 March 2023.

References:
ARHAR HOLDT, Špela and KOSEM, Iztok. Šolar, the developmental corpus of Slovene. 24 August 2023, PREPRINT (Version 1) available at Research Square. https://doi.org/10.21203/rs.3.rs-3274669/v1

ARHAR HOLDT, Špela, KOSEM, Iztok, STRITAR KUČUK, Mojca. Metode in orodja za lažjo pripravo korpusov usvajanja jezika. PIRIH SVETINA, Nataša (ur.), FERBEŽAR, Ina (ur.). Na stičišču svetov: slovenščina kot drugi in tuji jezik. Ljubljana: Založba Univerze, 2022. Str. 23-30, Zbirka Obdobja, 41. https://centerslo.si/wp-content/uploads/2022/11/Arhar-Holdt-et-al_Obdobja-41.pdf.

ARHAR HOLDT, Špela, KOSEM, Iztok, GANTAR, Polona, 2017: Corpus-based resources for L1 teaching: the case of Slovene. Ann Marcus-Quinn, Tríona Hourigan (ur.): Handbook on digital learning for K-12 schools. Cham: Springer. 91–113.

KOSEM, Iztok, ROZMAN, Tadeja, ARHAR HOLDT, Špela, KOCJANČIČ, Polonca, LASKOWSKI, Cyprian Adam, 2016: Šolar 2.0: nadgradnja korpusa šolskih pisnih izdelkov. Tomaž Erjavec, Darja Fišer (ur.): Zbornik konference Jezikovne tehnologije in digitalna humanistika. Ljubljana: Znanstvena založba Filozofske fakultete. 95–100. https://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Kosem-et-al_Solar-2-0-nadgradnja-korpusa-solskih-pisnih-izdelkov.pdf

KOSEM, Iztok, STRITAR KUČUK, Mojca, MOŽE, Sara, ZWITTER VITEZ, Ana, ARHAR HOLDT, Špela, ROZMAN, Tadeja, 2012: Analiza jezikovnih težav učencev: korpusni pristop. Ljubljana: Znanstvena založba Filozofske fakultete. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/229/329/5311-1