Publications Office of the EU
Labels and data models, why and how to use them - EU Vocabularies
Semantic knowledge base - title

Semantic knowledge base

Kunskapsbasvy

Labels and data models, why and how to use them

Across the articles published in the knowledge base most of the SPARQL scripts listed are mentioning a so-called preferred label (skos:prefLabel) and more specifically a particular language version of the label.

A simple way to query for the preferred labels is using a script like this:

 

PREFIX skos:<http://www.w3.org/2004/02/skos/core#>

select ?prefLabel
from <http://publications.europa.eu/resource/authority/browser>

where {

      ?c skos:inScheme ?scheme .

 

      # select prefLabel
      optional {

            ?c skos:prefLabel ?Label .

            FILTER (lang(?Label )="en").

            BIND (str(?Label) as ?prefLabel ).
            }
      }

Yet along with the preferred labels other types of labels are available in EuroVoc and the authority tables. While some might cover specific needs not all are necessarily needed to describe or to use a concept. Depending on your system needs you might use just one or at most two of them. For backwards compatibility and general accessibility at the Publications Office of the European Union we use many of them, mainly due to the different needs to be covered. A good idea to understand the reason behind this decision is to understand the data models first so we will start with that.

 

Data models and why do we need them all

SKOS, or in the long form Simple Knowledge Organization System is the de facto model for the creation of thesauries, taxonomies and code lists. We use it extensively and the main labels offered by this model are described in the sections below.

Simple Knowledge Organization System eXtension for Labels (SKOS-XL) is a dedicated extension, of SKOS, created to provide some facilities that can be of use depending on how do you decide to maintain and not the least to exploit your vocabularies. It might sound strange at first but all label types of the SKOS model, that you’ve seen in the previous section, have a mirror like representation in the SKOS-XL model. You might wonder why that is and for a good reason. While the names are perfectly identical, the function or better said the mechanism of application differ.

While SKOS is the de facto model for defining and managing internationalised vocabularies, for reasons of simplicity it imposes a very specific limitations. Namely, the label is just a property and can only take the form of a string. This approach makes SKOS labels easy to understand and approach in general but do not allow one to attach other type of metadata to the label. To give a simple example thing about situations where you want to maintain information about when a particular label was last edited. Editing the label is not the same as editing the concept itself so using some date properties at concept level are not necessarily a smart approach.

To solve that limitation,  SKOS-XL approach is based on the principle of adding a level of indirection in the workings of the SKOS labels. You do not add directly the label as a property of the concept but you add a link to an object containing the label, object that can be referenced itself. That means the actual label becomes an object, that can have properties including its own identifier, dates of creation or editing and so on.

RDFS, short for RDF Schema, or in the long form Resource Description Framework Schema, is the other major data model used to represent knowledge in a semantic form.

Being here you might wonder why would anyone choose to use RDFS when SKOS is also available. The answer is both simple and complicated. Hopefully we can manage to give the simple answer.

As in the case of SKOS-XL, due to its simplicity, SKOS avoids going too much into details about the meaning of things. For SKOS everything is about concepts. This is what makes it great. You can also define hierarchical relations between concepts (like narrower and broader) but that does not mean you are capturing anything in terms of meaning of a superior object that is more than a concept. This is where RDFS comes with an advantage introducing the concept of classes and other inference aspects (rdfs:subClassOf, rdfs:subPropertyOf). Practically RDFS is used to build more specific definitions of vocabularies while SKOS can be used for less formal vocabularies where we do not need the kind of details offered by RDFS.

 

As a conclusion, we encourage you to understand the meaning of each model, how it can be useful for you and your type of application and then decide what labels will you engage with.

 

Below you can see a list of the labels we use from each of the data models and how we employ them for the definition of the vocabularies.

Labels

skos:prefLabel

The preferred label it is in most cases (see also skosxl:prefLabel ) the label that defines the concept in case you are using SKOS model to represent your vocabulary. It can be considered the face of the concept as it is the one presented in the interaction with human users. From a linguist perspective it is the core element of the record represented by the concept and will be called a “term”.

In most cases, and EuroVoc is a good example, the vocabularies have multiple prefLabels. While it seems at first illogical, the reason for this is in the fact that we have to express the meaning of the concepts in different languages. As a result we have one preferred label per language. That implies that concepts have 1, 2, 3, 5, 24 ore even more prefLabels, depending on the vocabulary you are looking at and how many language variants the context of its usage requires. Yet only one preferred label per language is allowed. This is a critical condition for the clarity of expression and not the least for the technical integrity of the vocabulary in terms of usability.

When you see the language deployed inside an application, the user mainly sees the preferred label, in the language he/she chose to use .

Adding more preferred labels for the same language will make the vocabulary unusable for users and machines alike because it will not be clear what label to be displayed.

skos:altLabel

Every idea can be expressed in different ways and this is the raison d'être for the altLabels. The so-called alternative label gives to t he creator of the vocabulary the possibility to associate synonyms of the main label (prefLabel) to a concept. The altLabel can contain acronyms, abbreviations, spelling variants, plural forms, etc. Take the altLabel as an additional form of expression for the prefLabel enhancing the possibility to identify the concept.

skos:hiddenLabel

Hidden labels are a very powerful but much underrated type of label that the SKOS model offers. Hidden labels can be used when a specific expression that identifies the concept shall be searchable but not directly visible to a human user. Imagine now how many times did you entered a misspelled word in your preferred search engine and it still found the proper term. This is the exact specific usage for such a label. You can add/find here labels including intended erroneous characters to make the concept findable for obvious bad keystrokes.

rdfs:label

The rdfs:label is fulfilling in many ways the exact same function as the skos:prefLabel. You have the same approach in defining the language of the label and you have also the same limitation in terms of using a single label per language.

We use RDFS labels purely for backwards compatibility. It must be added that we are also applying rdfs:labels at dataset level, that means the title of any of our vocabularies is also represented as an rdfs:label.

skos-xl:prefLabel,
skos-xl:altLabel,
skos-xl:hiddenLabel

As said in the description of the model, SKOS-XL practically uses the exact same label type as SKOS with a slight twist. As a result of the direct relation we will not describe again the meaning of prefLabel, altLabel and hiddenLabel from SKOS, as they are the same in SKOS-XL. All that is changed with the SKOS-XL label is how the data architecture is implemented to best serve the needs of the vocabularies or the systems interacting with them. You can implicitly read the description of the labels given in the SKOS labels section and go back to the description of the models to understand if in your case using SKOS-XL does make sense.

In the case of the vocabularies published by the Publications Office of the European Union the use of SKOS-XL was implied by the ability provided to further extend in time the use of labels.

Note: All the SPARQL queries mentioned in the article can be tested on the SPARQL endpoint of the Publications Office of the European Union found at the following address: http://publications.europa.eu/webapi/rdf/sparql

Taggar
vocabularies rdf skos sparql
Senaste
P001 - SPARQL #1 den 27 februari 2024
P002 - SPARQL #2 den 27 februari 2024
P003 - SPARQL #3 den 27 februari 2024
Federated queries den 24 oktober 2021
About reference data den 23 oktober 2021
About the use of Authority tables Föregående