# 08 Named Entities

# Introduction to Labels

This chapter summarises labels for named entities (NEs). A more detailed presentation can be found in the guidelines in the Annotation Guidelines chapter.

<table id="bkmrk-category-subcategory"><thead><tr><th>**Category**</th><th>**Subcategory**</th><th>**Examples**</th><th>**Doesn't belong in the category**</th></tr></thead><tbody><tr><td>**PER** <span style="color:white">some white text</span></td><td>Person (name and/or surname)</td><td>Janez Novak, da Vinci, Ludvik XIV.</td><td>dr., gospa, sv.</td></tr><tr><td></td><td>Pet name</td><td>Fifi</td><td></td></tr><tr><td></td><td>Artistic name, pseudonym</td><td>Madonna, mati Terez(ij)a, Banksy</td><td></td></tr><tr><td></td><td>Fictional characters (from books, films etc.)</td><td>Ana Karenina, Rdeča kapica</td><td></td></tr><tr><td></td><td>Nicknames</td><td>(Boštjan Gorenc -) Pižama, Zvezdica89</td><td></td></tr><tr><td></td><td>Named group of people (placerelated or family name)</td><td>Angleži, Nemec, Ljubljančan; Novakovi</td><td></td></tr><tr><td></td><td>Twitter mentions</td><td>@pizama, @Nike</td><td></td></tr><tr><td>**DERIV-PER** <span style="color:white">some white text</span></td><td>Personal possessive adjectives</td><td>Novakov (pes)</td><td>Alzheimerjeva (bolezen)</td></tr><tr><td>**ORG**</td><td>Organizations</td><td>EU, Nato, Rimskokatoliška cerkev</td><td>parlament, vlada</td></tr><tr><td></td><td>Companies</td><td>Microsoft, Pasadena d.o.o.</td><td></td></tr><tr><td></td><td>Airport operators</td><td>Aerodrom Ljubljana</td><td>Letališče Jožeta Pučnika</td></tr><tr><td></td><td>Educational institutions</td><td>Filozofska fakulteta</td><td></td></tr><tr><td></td><td>Institutes</td><td> Institut “Jožef Stefan”</td><td></td></tr><tr><td></td><td>Museums, libraries</td><td> Prirodoslovni muzej</td><td></td></tr><tr><td></td><td>Theatres, cinemas etc.</td><td> MGL, Kinodvor</td><td></td></tr><tr><td></td><td>Media (TV, radio, newspaper etc.)</td><td> Dnevnik, Delo, Radio Center</td><td></td></tr><tr><td></td><td>Restaurants, hotels, bars, pubs etc.</td><td> Kavarna Zvezda, \[hH\]otel Lev</td><td></td></tr><tr><td></td><td>Healthcare facilities</td><td> \[zZ\]dravstveni dom Ribnica</td><td></td></tr><tr><td></td><td>Music bands and other art-related groups</td><td> U2, Beatli, \[aA\]nsambel Avsenik</td><td></td></tr><tr><td></td><td>Other public and private institutions</td><td> \[oO\]bčina Piran, NPK</td><td></td></tr><tr><td></td><td>Political parties, civic societies, NGOs</td><td> DeSUS, Zveza potrošnikov Slovenije</td><td></td></tr><tr><td></td><td>Sports clubs, associations</td><td> (HDD SIJ) Acroni Jesenice, (FC) Barcelona</td><td></td></tr><tr><td></td><td>Cultural organizations (also amateur)</td><td> \[mM\]ešani pevski zbor Divača</td><td></td></tr><tr><td>**LOC**</td><td>Celestial bodies (planets, comets etc.)</td><td> Mars, Andromeda, Halleyjev komet</td><td></td></tr><tr><td></td><td>Continents</td><td> Južna Amerika</td><td></td></tr><tr><td></td><td>Countries, provinces, lands (historic and modern)</td><td> Slovenija, Združene države (Amerike)</td><td>EU</td></tr><tr><td></td><td>Regions</td><td> Primorska, Valonija, Nova Anglija</td><td></td></tr><tr><td></td><td>Cities and settlements (including parts)</td><td> Ljubljana, Šiška, Vrhnika, Na klancu</td><td></td></tr><tr><td></td><td>Streets, squares</td><td> Jamova cesta 39</td><td> A2, gorenjska AC</td></tr><tr><td></td><td>Shopping centres</td><td> Citypark, Supernova</td><td></td></tr><tr><td></td><td>Airports</td><td> Letališče Jožeta Pučnika</td><td></td></tr><tr><td></td><td>Churches (named building)</td><td> \[cC\]erkev sv. Nikolaja</td><td>Rimskokatoliška cerkev</td></tr><tr><td></td><td>Local sights (cultural, natural)</td><td> Tromostovje, Triglavski narodni park</td><td></td></tr><tr><td></td><td>Other named buildings (without org. structure)</td><td>\[kK\]ulturni dom Ljubno, WTC 2</td><td>Cankarjev dom (ima org. strukturo, npr. direktorja)</td></tr><tr><td></td><td>Mountains, lakes, rivers and other named geographical objects</td><td>Triglav, Blejsko jezero, Sava, Logarska dolina</td><td></td></tr><tr><td>**MISC**</td><td>Computer systems, programs, apps</td><td>Windows 10, Word, Android 5.1 Lollipop</td><td> .docx, pdf, OCR</td></tr><tr><td></td><td>Titles of books, films, paintings and other works of art; titles of documents</td><td>Vojna in mir, Ko jagenjčki obmolknejo, Sopranovi, Guernica; Uradni list RS</td><td></td></tr><tr><td></td><td>Registered names or models of products (cars, mobile phones, computers, games etc.) and other commercial products (brands)</td><td>Galaxy Note 7, Nokia Lumia 950, Toyota RAV4, Minecraft, Človek ne jezi se</td><td></td></tr><tr><td></td><td>Titles of events</td><td>Oskarji, Zlata lisica, 10. mednarodna konferenca Jezikovne tehnologije</td><td>shod nacifašistov</td></tr><tr><td></td><td>Project names</td><td>Obzorje 2020</td><td></td></tr><tr><td></td><td>Stock market indices</td><td>SBI20, Dow Jones, Nasdaq</td><td> Bonitetne ocene (AAA)</td></tr></tbody></table>

# Annotation Guidelines

This chapter summarizes the annotation guidelines for named entity recognition (NER) as applied to Slovene texts.

**Version 1.1  
Project [Janes - Resources, Tools and Methods for the Research of Nonstandard Internet Slovene](https://nl.ijs.si/janes/english)**  
ZUPAN, Katja; LJUBEŠIĆ, Nikola in ERJAVEC, Tomaž, 2017: *Annotation guidelines for Slovenian named entities Janes-NER*: Version 1.1. [\[PDF\]](https://wiki.cjvt.si/attachments/31)

# References and Links

This chapter compiles relevant references and provides links to projects where named entity recognition (NER) has been developed and applied to Slovene texts.

**Projects, in which the system has been developed or applied** [MUC-6 Named Entity Task Definition](http://cs.nyu.edu/faculty/grishman/NEtask20.book_1.html)   
[CONLL 2003](http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt)   
[BSNLP 2017 shared task](http://bsnlp-2017.cs.helsinki.fi/shared_task.html)  
[Janes - Resources, Tools and Methods for the Research of Nonstandard Internet Slovene](https://nl.ijs.si/janes/english/)  
[Development of Slovene in a Digital Environment](https://rsdo.slovenscina.eu/en)

**References**  
Marc Reznicek: *Linguistische Annotation von Nichtstandardvarietäten / Guidelines und „Best Practices" Guidelines NER* (version 1.5). [https://www.linguistik.huberlin.de/de/institut/professuren/korpuslinguistik/forschung/nosta-d/nosta-d-ner-1.5](https://www.linguistik.huberlin.de/de/institut/professuren/korpuslinguistik/forschung/nosta-d/nosta-d-ner-1.5)

LDC - Linguistic Data Consortium: ACE (Automatic Content Extraction) English Annotation Guidelines for Entities, Version 6.6 2008.06.13, [http://projects.ldc.upenn.edu/ace](http://projects.ldc.upenn.edu/ace) (Accessed on 2 November 2020).