Skip to main content

Data model

The central entity types of the datamodel are lexical_unit and sense. They connect the morphological and semantical data in the data model. In essence it is designed to be a multilingual model, however, currently it is used as a monolingual model that connects with multilingual data (which does not have the same granularity) via special entity types.

On a top level view the model can be devided into clusters (color-coded in the model):

  • sense data (blue)
  • lexical unit data (yellow-green)
  • morphological data (green)
  • structure data (grey-brown)
  • multilingual data (pink, red)
  • sense frame data (violet)
  • example data (yellow)
  • data pertaining to division into resources i.e. different dictionaries (orange)
  • feature data (grey)
  • tables that reference other tables via meta-attributes (white)

The corpus data is not contained in the database itself, but is referenced and accessed via concordancer. Some parts of the data model (e.g. structure data) are defined as XML. They are used directly in existing processing pipelines, but can be ported to ER model if necessary.