Semantic knowledge base - title

Semantic knowledge base

Knowledge Base Search

articles-with-tag-x

title	author	create-date	modified-date
About reference data	admin admin	x-ago	x-ago	38509 views

Knowledge Base Display

Identifiers and how to use more than one

As any believer in the principles of reference data, you might have asked yourself why the Publications Office does not “walk the talk” in using a unique identified for the concepts listed in the authority tables. The answer is mixed because OP does believe in the utility of a unique identifier but also has to publish multiple identifiers at the same time.

If that sounds not very logical we think that the below description of the context will help us understand the decisional mechanism.

As you might know already, the reference data catalogue of the Publications Office contains close to two hundred vocabularies. Some of them are domain-specific, and some other are technical tables that are serving our common needs in terms of access and organisation of the data (e.g., Use context).

The main subject-based vocabularies have to follow the standards applicable in each field. In most cases those standards are maintained by organisation like ISO or IANA and following their identifiers is a logical decision. But this can extend to situations where:

multiple standards exist in that specific field
previous codes have been used in the past by some applications and for backward compatibility, you need to keep them

As a result, some of the vocabularies you can find in the Publications Office catalogue do list a multitude of identifiers.

Let’s get one particular example that is well known, the Country table.

If you look inside it for a concept like Austria for example, you will find the following list of identifiers:

Identifier	Associate code
IANA	.at
ISG COU	AT
ISO 3166-1 α-2	AT
ISO 3166-1 α-3	AUT
ISO 3166-1 num	040
TIR	A
UNSD M49	040
FD_010	A
FD_040	AT
FD_050	A
FD_110	AT
FD_140	AT
FD_160	AT
FD_290	A
FD_325	A
FD_375	AT
FD_380	AT
FD_400	AT
MNE	AT
PUB_LOC	AT
PUB_LOC	{AUT}
TED	AT
TED Schema	AT

It is easy to observe in this list of identifiers two areas. The first part contains the identifiers associated with standards, like IANA, Interinstitutional Style Guide, ISO, TIR, and UNSD notation. The second part is formed by codes that have been used in different other systems (e.g., TED) or references to other datasets that contain the same value (FD_nnn).

Seeing a total of over 20 identifiers associated with a single concept in the table looks not very tidy. Yet this is acceptable for a reference data asset that is constructed as a knowledge graph. You do not need to interact with all of them. A SPARQL query can be written to return just one of them or eventually a set of identifiers as following:

PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX lbl: <http://publications.europa.eu/resource/authority/label-type/>
PREFIX euvoc: <http://publications.europa.eu/ontology/euvoc#>
  
SELECT ?label ?code as ?ISO_3166_1_3
 
# chose the table you are looking into
FROM <http://publications.europa.eu/resource/authority/country>
 
WHERE
{
    ?c skosxl:prefLabel|skosxl:altLabel ?xlLabel .

    # pick the label type: Standard label
    VALUES ?Labeltype { <http://publications.europa.eu/resource/authority/label-type/STANDARDLABEL> }
    ?xlLabel dct:type ?Labeltype .

    ?xlLabel skosxl:literalForm ?xlLiteralForm.

    filter (lang(?xlLiteralForm)="en")

    # convert label to string
    BIND ( str(?xlLiteralForm) as ?label)

    ?c euvoc:xlNotation ?notation .
 
    # look for a specific notation type: ISO 3166-1 α-3
    VALUES ?notationType { <http://publications.europa.eu/resource/authority/notation-type/ISO_3166_1_ALPHA_3> }
 
    ?notation dct:type ?notationType .
 
    # select the code
    ?notation euvoc:xlCodification ?xcode.

    # convert code to string
    BIND ( str(?xcode) as ?code)

     
}
ORDER BY ?label
LIMIT 10

This script executed on the CELLAR SPARQL endpoint will return the following:

label	ISO_3166_1_3
Afghanistan	AFG
Albania	ALB
Algeria	DZA
American Samoa	ASM
Andorra	AND
Angola	AGO
Anguilla	AIA
Antarctica	ATA
Antigua and Barbuda	ATG
Argentina	ARG

As you can see, the data returned is clean and clear, listing just the basic values that are needed for a particular system or project. This is a very simple demonstration of the practical use of the knowledge architecture used by the Publications Office for the storage and dissemination of reference data.

Note: All the SPARQL queries mentioned in the article can be tested on the SPARQL endpoint of the Publications Office found at the following address: http://publications.europa.eu/webapi/rdf/sparql

Semantic knowledge base

Identifiers and how to use more than one

tags

Let’s collaborate

Vajadzīga palīdzība?

Sekojiet mums!

Juridiski jautājumi

Par mums

Resursi

Rīki

Kā sazināties ar ES

Sociālie mediji

Juridiski jautājumi

ES iestādes

Semantic knowledge base

Identifiers and how to use more than one

tags

Let’s collaborate

Eiropas Savienības Publikāciju birojs

Vajadzīga palīdzība?

Sekojiet mums!

Juridiski jautājumi

Par mums

Resursi

Rīki

Eiropas Savienība

Kā sazināties ar ES

Sociālie mediji

Juridiski jautājumi

ES iestādes