REST API

Public REST API for accessing the database.

API design

Principles of the API design:

  1. All documented routes should be appended to https://blisk.ijs.si/api/.

  2. All the routes are available as POST calls, even if they do not result in changes in the database, because:

    • some routes will have non-trivial input parameters (structured data, arbitrary strings), which are difficult and clunky to encode as path parameters, and expecting request body parameters in GET calls can be problematic and misleading
    • we can have short clear URLS for all routes, and there are limits on URL length in some contexts.
  3. Some routes also have a GET counterpart, which behave the same way as the POST call but do not allow for response body parameters (default values are used instead).

  4. All request parameters are provided as JSON request body parameters, except for the object's id, which is used to identify a given object and provided as an obligatory path parameter for certain types of calls (e.g., retrieve).

  5. The following HTTP response codes are used:

    • 200: for most successful requests
    • 201: for successful get-or-create requests where no matching object was found and a new one was created
    • 400: an error occurred due to invalid or unexpected request parameters or combinations
    • 401: authorisation denied (suitable credentials are needed for routes which write to the database)
    • 404: objects were not found for the value (usually id) provided
    • 501: the specifications for this route are designed but it has not yet been implemented
  6. Each route falls under a particular type of operation identified with a particular verb as the first part of the route. The verbs include:

    • retrieve: return data for a given object
    • search: return all the objects which match the set of search parameters
    • get-or-create: get the object matching the parameters provided, creating one if necessary, along with any other missing objects it depends on
    • update: update the properties of a given object based on the parameters provided
    • delete: delete a given object
    • attach: attach the given object to a particular resource, if not yet attached
    • detach: detach the given object from a particular resource, if attached
    • process: process the input data with an appropriate independent tool (e.g., the CLASSLA NLP library)
  7. If the operation verb has a "-batch" suffix, it differs from its non-batch counterpart as follows:

    • users can make 1 API call instead of N API calls for N items
    • the input data should be a list, with each element in the format expected by the non-batch route
    • the output data is a list, with each element corresponding to the element at the same position in the input, where each element has three fields:
      • status: the HTTP response code that would be used if the element was processed in a non-batch call
      • message: a message describing the results of the operation (e.g., whether an object was found or created, or the cause of the warning or error)
      • data: the output data (for successful calls), in the same format as non-batch output
  8. Routes which do not change data in the database (retrieve, search, process) are publicly available. Routes which may result in changes in the database (get-or-create, update, delete, attach, detach) require authentication credentials.

API routes

The API is being designed and developed, with priority on current needs. Specifications are available in redoc (which is better formatted visually) and swagger (which allows you to try the API via the interface).

Here is a list of the current routes (last update: 06.12.2022). All routes are available with POST, while some of them also have GET or batch POST alternatives (ref). The routes that are not read-only have restricted access.

Route Read-only Description
/search/lexical-unit/ yes search for lexical units based on their properties and parts
/retrieve/lexical-unit/ yes get a lexical unit's basic data
/get-or-create/lexical-unit/ no get or create a lexical unit based on properties and components
/search/lexeme/ yes search for lexemes
/retrieve/lexeme/ yes get a lexeme's data
/get-or-create/lexeme/ no get or create a lexeme based on defining properties
/retrieve/lexical-unit-lexemes/ yes get the lexical unit's component lexemes
/search/category/ yes search for a lexeme's category (part of speech) by string
/search/form/ yes search for word forms by string
/search/sense/ yes search for senses
/retrieve/lexical-unit-senses/ yes get the senses of a lexical unit
/retrieve/lexical-unit-sense-relations/ yes get the sense relations of a lexical unit's senses
/retrieve/lexical-unit-collocations/ yes get the collocations of a lexical unit
/retrieve/lexical-unit-translations/ yes get the translations of a lexical unit
/retrieve/lexical-unit-sense-examples/ yes get corpus examples for the senses of a lexical unit
/get-or-create/resource/ no get or create a dictionary or other resource
/search/resource/ yes search or list resources available
/attach/lexical-unit/ no attach a lexical unit to a resource
/detach/lexical-unit/ no detach a lexical unit from a resource
/search/syntactic-structure/ yes get the XML definitions of syntactic structures
/process/string-to-tokens/ yes parse a Slovene string to get a list of tokens

API implementation

The public API is being implemented using the Django REST Framework and APIViews in particular. It is part of the Python codebase, Django project and Git repository that is used to manage the database in general. We are striving to keep the business logic and API route definitions in separate modules, so that different APIs (e.g., editor API, internal API) can use the same utils module.

Most of the logic and processing of the API is internal. However, there are a few aspects that rely on other tools, such as Slovene string parsing and fetching of corpus examples.

API use cases

In addition to providing general public access to the database, the REST API can also be used to integrate data and services with external organisations in a coordinated, structured and systematic way. Two current examples of this are integration with terminology portals and speech technologies, both of which use a mix of public (read-only) and restricted (read-write) routes of the API.

Terminology Portal

One of the main parts of the Development of Slovene in a Digital Environment is a terminology portal that will feature various terminological resources and offer an openly accessible tool for term extraction from specialized corpora, as well as the server infrastructure needed to create new terminological resources. The main components of the portal include a search engine for all integrated resources and a terminology resource editor, and the resources are designed to be easily integrated with other language tools and services, including the Digital Dictionary Database.

As such, the portal uses API routes to register its dictionaries in the database, search and create terms, attach/detach them to/from the dictionaries, and fetch their forms and statuses. The API supports this as follows (see the route links for full examples):

Speech technologies

The project Tolmač (Eng. Interpreter) is focused on developing of a system for automatically translating lectures from Slovene to other languages, coordinated at the Faculty of Computer and Information Science at the University of Ljubljana, in close collaboration with the Centre for Language Resources and Technologies. The results of the project will be important for a wide range of people: real-time translations will make it easier for foreign students to follow lectures in Slovene, automatic subtitles will help people will hearing loss, and lecture excerpts and recordings will be accessible at a dedicated website. The speech technologies underlying the system rely on search and retrieval of both orthographic and pronunciation word forms of Slovene words.

To that end, the system can use API routes to preprocess text, search for different kinds of word forms, retrieve the forms of word and create new words along with their forms. The API supports this as follows: