API use cases

Use cases

In addition to providing general public access to the database, the REST API can also be used to integrate data and services with external organisations in a coordinated, structured and systematic way. Two current examples of this are integration with terminology portals and speech technologies, both of which use a mix of public (read-only) and restricted (read-write) routes of the API.

Terminology Portal

(Conceptual intro: Simon?)

To that end, the portal needs routes to register its dictionaries in the database, search and create terms, attach/detach them to/from the dictionaries, and fetch their forms and statuses. The API supports this as follows:

Register the dictionary as a resource in the database using /get-or-create/resource/, providing a code name for the dictionary (e.g., "slm") as input. The API will return the resource's ID (e.g., 87), first creating it if it does not yet exist.
Get IDs of terms in the database, creating them if necessary, using /get-or-create/lexical-unit. Input can be either the term's raw string (e.g., "okrogla miza"), or its pre-analysed sequence of tokens, with each token represented with corpus-style data (e.g., [{"lemma":"okrogel", "msd":"Ppnzei", "form":"okrogel"}, {"lemma":"miza", "msd":"Sozei", "form":"miza"}]). If a raw string is provided, the API uses a standard tool to get a sequence of tokens itself. Either way, it then checks if a matching lexical unit exists in the database, creates one if necessary, and returns its basic data, including its ID (e.g., 54321). If many terms need processing, this can be done by using the /get-or-create-batch/lexical-unit call instead and providing a list of inputs.
Get the word parts and their forms with statuses for a specific term, by using /retrieve/lexical-unit-lexemes/. Input would be the term's ID (e.g., 54321) and specifying that form statuses and all form types are also requested (e.g., "extra-data":["status-form-types", "forms-orthography", "forms-accentuation", "forms-pronunciation"]). The output then a list, with each element is one of the term's word constituents, represented with both its basic data (such as id, lemma, part of speech, basic word-level features) and the extra requested data (the list of all the forms of all the types for the word and aggregated statuses for each type). See redoc for a full example.
Search for term candidates in the datases, using /search/lexical-unit. This has similarities to /get-or-create/lexical-unit but differs in a few key ways. First, it does not create a lexical unit if no match exists, and thus does not require authentification, so less central components of the portal can also make use of it. Second, it does not require complete data in the input (e.g., perhaps only one or two of lemma/msd/form are specified for one or more of the term's components), making the search more flexible and potentially returning multiple matches (e.g., {"lemma":"klop"}). See redoc for a full example.
Attach the lexical unit to a resource, using /attach/lexical-unit/. The input would be the IDs of the term as a lexical unit (e.g., 54321) and dictionary as a resource (e.g., 87). This would then connect the two in the database.
Detach the lexical unit from a resource, using /attach/lexical-unit/. The input would be the IDs of the term as a lexical unit (e.g., 54321) and dictionary as a resource (e.g., 87). This would then remove the term from the resource in the database, without deleting the term from the database in general.

Speech technologies

(Conceptual intro: Simon?)

To that end, the service needs API routes to preprocess text, search for word forms, retrieve the forms of word and create a new word along with its forms. The API supports this as follows:

(Bullet points: Cyprian)