Skip to main content

Data model

The central entity types of the datamodel are lexical_unit and sense. They connect the morpho-syntactic and semantical data in the data model. In essence the model is designed to be a multilingual model, however, currently it is used as a monolingual model that connects with multilingual data (which does not have the same level of granularity) via special entity types.

On the top level the model can be devided into clusters (color-coded in the model):

  • sense data (blue)
  • lexical unit data (yellow-green)
  • morphological data (green)
  • syntactic structure data (grey-brown)
  • multilingual data (pink, red)
  • sense frame data (violet)
  • example data (yellow)
  • data pertaining to division into resources i.e. different dictionaries (orange)
  • feature data for various entities (grey)
  • entity types that reference other entity types via meta-attributes (white)

The corpus data is not contained in the database itself, but is referenced and accessed via concordancer. Some parts of the data model (e.g. structure data) are defined as XML. They are used directly in existing processing pipelines, but can be ported to ER model if necessary.