ENDORSE programme 2021 - Endorse 2021
Endorse Header

ENDORSE - The European Data Conference on Reference Data and Semantics 16-19 March 2021ENDORSE - The European Data Conference on Reference Data and Semantics 16-19 March 2021

Programme top image


Programme Day 1


Download Day 1 programme as PDF

Time mentioned corresponds to the Central European Time (CET)


Reference data governance in cross-organisational environments


Welcome & instructions


Opening speeches

Johannes Hahn

European Commissioner for Budget and Administration

António Carneiro

Acting Director-General of the Publications Office of the European Union


Linked Data Event Streams: the core API for publishing base registries and sensor data

Keynote – Pieter Colpaert

Postdoctoral researcher public Web APIs at Ghent University






Knowledge Scientist – Unlocking the data-driven organization

Juan Sequeda (data.world) & George Fletcher (TU Eindhowen)

You can often tell whether an organization is going to be successful by the way it treats its data. Does it view data as a key asset that should be nurtured, questioned, and broadly adopted? Or does it see data as a security risk that must be tightly guarded and available only to the few scientific minds who can understand it? Does the organization know who is responsible for the data? In this talk we will discuss the importance of a data product manager and knowledge scientist as the bridge between data producers and consumers.


Introduction to the W3C Data Catalog Vocabulary (DCAT)

Peter Winstanley (Semantic Arts)

Introduction to the W3C Data Catalog Vocabulary (DCAT), the changes in version 2 and the proposals for version 3


Creating a Corporate Reference Data Management Policy for the European Commission

Makx Dekkers (Independent consultant) & Willem van Gemert (Publications Office of the European Union)

The development and implementation of a Corporate Reference Data Management in the European Commission. The presentation will cover several aspects of the work including the policy objectives, the establishment of a coordination group, the policy content, the development of best practices and the application profile for the description of corporate reference data assets based on DCAT, the investigation of tools for reference data management, and discussions on specific types of reference data, e.g. geo-referencing.


Building a metadata space for cultural heritage at Europeana, using best practices for modeling, enriching and sharing data

Antoine Isaac (Europeana)

Europeana gathers metadata to enable access to a wide range of digital cultural collections from libraries, museums and archives, coming from all European countries. To solve issues of data quality, especially regarding completeness and heterogeneity, and to provide an access service that works across specific institutions, domains and languages, Europeana has embraced a semantic web approach. We gather metadata using a shared data model, the Europeana Data Model (EDM). This model allows Europeana to maintain a sustainable aggregation (and publication) of metadata about digital representations of culture artefacts, supporting contextualization and multilingualism. The metadata about cultural objects is indeed enriched by links to multilingual linked data sets that describe entities that provide context for these objects (persons, places, concepts). We will present how Europeana benefits here from the re-use of reference data outside its sector, be they very general, large datasets like DBpedia or more specialized vocabularies published for representing languages or rights statements.


Automating Reference Data Crosswalks Using Knowledge Graphs

Irene Polikoff (TopQuadrant)

Reference data is found in practically every enterprise application including back-end systems, front-end commerce applications, and the data warehouse. It has a major impact on everything from the integrity of business intelligence reports to the success or failure of system integration efforts. While when well managed and consistently used reference data is a key enabler of information interoperability, it is rare for an enterprise to have a single canonical reference dataset for a given entity – be it a location code, product category or gender. For legacy and other reasons, organizations often use multiple alternative codesets in different applications. Further, organizations often must combine and cross-reference public reference data with their private reference data covering the same categories. For example, health care and pharma companies typically use local codesets to capture and categorize information about patients and clinical studies. These local codesets are commonly mapped to the public vocabularies of diseases, medications and other life sciences information (e.g., SNOMED, NCBI, MedDRA) Once these different, but related, reference datasets are brought together to form a knowledge graph, they can be connected and mapped. Crosswalks connecting related reference data offer a way to accomplish such mappings. However, the number and variations of reference data used by different groups in organizations is often large. Building and maintaining crosswalks manually is tedious and expensive, so there is strong interest in automating these processes. We will also insist on the principles that make such an approach workable. Europeana aims in general to be usable, mutual and reliable. In a context with so many stakeholders and existing data practices (and varying levels of technical ability), this especially requires a great deal of flexibility and putting the community at the heart of every move. We also try to re-use (and contribute to) reference best practices frameworks, such as the W3C Data on the Web Best Practices, as much as possible.


The Language Metadata Table

Yonah Levenson (MESA)

The Language Metadata Table was conceived to resolve the lack of a single unified standard for language codes for the broadcast and media industry. As a result, each point of distribution/information exchange creates their own language code table, or follows different language code standards, thus creating a kind of Tower of Babel. The LMT provides the single-source solution for language codes. The LMT released in July 2018 had 125 language codes; today there are over 230 language codes, and, another 50+ codes are in the process of being added. This presentation will provide the current state of where the LMT is today, and where it is headed as adoption by organizations from content creators to distributors and others increases.


The ERA Knowledge Graph: Raising data interoperability on the railway domain

Julián Rojas (Ghent University – imec)

A common interpretation of different concepts, assets and systems that exist within the railway domain is vital for achieving safe vehicle operations and facilitating infrastructure maintenance and evolution. For these purposes, the European Railway Agency supports a set of base registry databases where organizations input different aspects of the information they manage such as infrastructure topology (RINF), authorized vehicle types (ERATV), particular vehicle data (ECVVR), among others. Answering queries across these base registries still remains challenging due to different modeling approaches and lack of a canonical identifier strategy for entities and concepts across base registries. An interoperability strategy is thus required to enable more complex queries and applications that support the goals and needs of the railway domain. Based on open and standard Semantic Web technologies, we (i) created an initial ontology defining common railway domain concepts, inspired by existing related data models; (ii) generated an RDF Knowledge Graph from the RINF and ERATV and ECVVR base registries, using a set of declarative rules based on the RML specification; and (iii) developed an application to support the route compatibility check use case, which requires information originated from multiple registries. We show how graph-based data models facilitate the development of innovative applications by raising the interoperability of heterogeneous and distributed data.


The Quest for Content in Context: Using Standards and Semantics for Interoperability

Rahel Anne Bailie (Content Seriously Consulting)

Government organisations understand the importance of having good data hygiene, good reference data sets, and data standards for interoperability. But data alone doesn't tell the whole story. Content engenders context, which, in turn, completes the story. To allow content to operate at its full potential, it needs its own standards, its own semantics, and ultimately its ability to interoperate. 
Content has been largely ignored in the data arena. Content often gets treated like data, confined to cells in a database, being moved around in static chunks like so much boxed cargo. The mechanisms meant for processing data limit content in so many ways. The complexity and nuance are dampened; the contexts are limited; its potential is hobbled. 
The need for automatically discoverable, re-usable, reconfigurable, and adaptive content has been spurred on by the fourth industrial revolution. This has taken the study and application of "intelligent content" to new levels of  automation on the delivery side, a demand shaped by the offering of Content as a Service. This has opened the door for both content standards and applied semantics as ways of automatically delivering content alongside data for more targeted contexts.



EuroSciVoc taxonomy and EURIO ontology: CORDIS as (semantic) data provider

Enrico Bignotti & Baya Remaoun (Publications Office of the European Union)

The Community Research and Development Information Service (CORDIS) is the European Commission's primary source of results from the projects funded by the EU's framework programmes for research and innovation (FP1 to Horizon 2020). The CORDIS mission is to bring research results to professionals in the field to foster open science, create innovative products and services and stimulate growth across Europe. CORDIS has a rich and structured public repository with all project information held by the European Commission such as project factsheets, participants, reports, deliverables and links to open-access publications. CORDIS began to redesign its data-model in accordance with Semantic Web standards with the long-term vision of making all CORDIS data Linked Open Data, thus linking the data with other inter- and extra-institutional data and increase their value for both private and public stakeholders. Such vision is what brought CORDIS to move from acting as a data repository to finally playing an active role as data provider. Data curation of CORDIS content is instrumental to this end, and consists in two aspects: Data Quality: detecting and solving issues with data, e.g., entity duplicates, identity conflicts, missing values, etc. Data Enrichment: supplementing any missing details with data retrieved from external sources, e.g. importing Italian VAT numbers from a national repository to enrich data about Italian organizations. To enable data curation processes, CORDIS developed two main assets: 1. In order to categorize its projects, improving their discoverability and findability in a controlled and standardized way, the “European Science Vocabulary” (EuroSciVoc) was created. It is a multilingual, SKOS-XL based taxonomy that represents all the main fields of science that were discovered from the CORDIS content, e.g., project abstracts. It was built starting from the hierarchy of the OECD's Fields of R&D classification (FoRD) as root and extended through a semi-automatic process based on NLP techniques. It contains almost 1 000 categories in 6 languages (English, French, German, Italian, Polish and Spanish) and each category is enriched with relevant keywords extracted from the textual description of CORDIS projects. It is constantly evolving and is available on EU Vocabularies website 2. In order to transform CORDIS data into Linked Open Data, thus aligning with Semantic Web standards, best practices and tools in industry and public organizations, the need for an ontology emerged. CORDIS created the EURIO (European Research Information Ontology) based on data about research projects funded by the EU's framework programmes for research and innovation. EURIO is aligned with EU ontologies such as DINGO and FRAPO and de facto standard ontologies such as schema.org and the Organization Ontology from W3C. It models projects, their results and actors such as people and organizations, and includes administrative information like funding schemes and grants. EURIO, which is available on EU Vocabularies website, was the starting point to develop a Knowledge Graph of CORDIS data that will be publicly available via a dedicated SPARQL endpoint. These two assets will be presented as the main outcomes of the evolution of CORDIS role, focusing on their development to enable the curation, enrichment and reuse of all CORDIS data.






Eight causes of bad data quality – and what you can do about it

Helen Lippell (Independent consultant)

This proposal is for a SOAPBOX with a PRACTICE focus. I want to present eight causes of bad quality data. Each one will be drawn from examples from my own career. By ‘data’, I mean a very wide definition that encompasses data, information and content produced either for use inside an organisation or for outside customers of a product or service. The costs of bad quality data can be felt in two broad ways: Wasted time and money; Missed opportunities to exploit the value of data. Some of my examples come from organisations whose core focus is in selling information and who therefore needed to differentiate themselves from others in their markets by virtue of the quality and timeliness of their services. My other examples come from organisations whose main focus was not in and of itself selling information. The eight causes of bad quality data that I will cover are: Lack of agreed definitions for concepts and categories; Lack of process for managing data; Poor understanding of how different audiences use different language for the same concepts; Poor understanding of the information domain; Ambiguous language (e.g. homonyms); Poor quality source data; Lack of data; Data entropy. I will highlight steps that can be taken to reduce the impact of bad quality data. I will conclude by restating the value of good quality data for all types of organisations, regardless of whether they are in the information business or not, and no matter what size they are or what sector they operate in.


The European Data Portal – A Large-scale Application of Semantic Vocabularies

Fabian Kirstein (Fraunhofer FOKUS)

Open Data is a successful and continuously evolving idea. The very core concept is the publication and reuse of datasets. Common publishers are public administrations, research institutes, and non-profit organizations. Typical users are journalists, companies, and governments. The established method of publishing Open Data is on a web platform, which is responsible for retrieving, storing, and publishing the data. The European Data Portal (EDP) makes all metadata of Open Data published by public authorities of the European Union and beyond available in one portal. It was launched in November 2015 by the European Commission and is managed by the Publications Office of the European Union. As of November 2020, it offers more than 1 million datasets harvested from more than 80 data providers. The core metadata properties are available in all 24 official languages of the EU. It is considered Europe’s one-stop-shop for open public sector information and is not limited to the provision of metadata, but constitutes a platform for supporting the broad dissemination, reuse and improvement of Open Data in Europe. Therefore, it offers curated content, reports, news and guidelines for data providers and data users. The central component of the EDP is the metadata registry that pioneered in applying the DCAT Application profile for data portal in Europe (DCAT-AP). DCAT-AP is designed to describe public sector datasets and is based on Linked Data principles. It is derived from the Resource Description Framework (RDF) vocabulary Data Catalogue Vocabulary (DCAT). DCAT-AP makes extensive use of the controlled vocabularies and reference data provided by the EU Publications Office (EU Vocabularies). For example, metadata like spatial information, languages or MIME types can be expressed and harmonised based on this reference vocabulary. Furthermore, there exist validation data based on the Shapes Constraint Language (SHACL), which allows verifying the accuracy of DCAT-AP datasets. For the EDP we developed a novel metadata registry from scratch to manage and practically apply these specifications and harness their full potential. It is based on Semantic Web technologies, applies a triplestore as primary database and makes RDF a first-class citizen. The solution has three main components: A harvester, a registry and a quality service. The harvester periodically retrieves the metadata from the various source portals and harmonises the data according to the DCAT-AP specification. As the core data format is RDF, non-RDF or not supported RDF dialects sources require a transformation. It supports a variety of data formats: CKAN-API, uData, OAI-PMH, RDF and SPARQL. The registry stores the metadata as RDF and manages a high performance search index. For a user friendly and multilingual presentation of the metadata, it resolves certain properties, which are based on reference data, like the EU Vocabularies. This allows to present the metadata in a much more human-readable manner in comparison to vanilla RDF. Finally, the quality services validates each dataset based on the SHACL reference data and additional metrics. The results itself are also stored as RDF, applying the RDF Data Quality Vocabulary (DQV). A human-readable version is also available, e.g. as PDF. The entire metadata is accessible via multiple APIs, including a SPARQL endpoint and RESTful interface. Furthermore, our solution is published as Open Source.


Round table

Filing plans – Reference data from the past or bridge to auto-classification


Seth van Hooland


Prisca Giordani (International Monetary Fund)

Kuldar Aas (National Archives, Estonia)

Frederik Rosseel (Agoria, SCIO, DocShifter, docbyte)

Bernice Ibiricu (European Central Bank)

Constantin Stancu (Council of the European

Isabel Salgueiro (Council of the European


Three artifacts to manage your metadata and reference data

George Firican (LightsOnData)

An overview of 3 important tools any organization should be using to manage their metadata and reference data. In this session you will learn what a business glossary, a data dictionary and a data catalogue are, the benefits they bring, and what the relationship between all 3 is.




SSIF 2021 – Slovak Semantic Interoperability Framework 2021

Miroslav Líška (Ministry of Investments, Regional Development and Informatization of the Slovak Republic)

The Slovak Semantic Interoperability Framework (SSIF) was initially created in 2013. The objective is to establish semantic interoperability in all Slovak public administration systems and also be data interoperable toward all European countries. From 2016 a linked open government data focused community started to publish LOD Slovakia in order to show how interoperability can be established using semantic web technologies. At present this agenda is fully incorporated and supported by the Central Data Office at Ministry of Investments, Regional Development and Informatization of the Slovak Republic. However the hardest part to steer Slovak public data toward semantic interoperability is happening right now. Several new complex public information systems are being developed or updated or will start to produce lots of open data. In these days there is big effort in handling requirements, questions from lots of public authorities to preserve the consistency, coverability of the SSIF for all such information systems. The agreed strategy is to provide support from the Central Data Office to all involved stakeholders as much as possible to adopt and create new semantics representation of data structures in public services, information systems or in publishing open data. Even these activities are not so easy and trivial, they are the most crucial for improving quality of all public data, and consequently quality of living. If it is achieved semantic interoperability in data, then the development of public services, or providing data analysis, or consuming open data is substantially easier, faster and precise.


Role of standards organizations in the USA and their impact on knowledge representation in the global context

Joseph Busch (Taxonomy Strategies)

This talk describes what an enterprise knowledge graph is, provides advice on how to assess the readiness of an organization to undertake an enterprise knowledge graph project, and describes a method for how to develop a plan to build and maintain a knowledge graph. An enterprise knowledge graph is a representation of an organization’s knowledge domain and artifacts that is understood by both humans and machines. It represents an organization’s knowledge assets, content, and data—people, places, documents, photos, data in relational databases, etc.—and how these things are related to one another. Those things and relationships are represented using a metadata model specific to the industry or developed for the organization. Typically, this is an ‘ontology’ that defines classes for the things, properties for the things, and relationships between the things. It takes a lot of work to build the infrastructure for an enterprise knowledge graph, and then establish and implement the policies and procedures to link up a critical mass of an organization’s materials.


Closure & SliDo survey (input from participants in the form of recommendations for the future)


Closure & SliDo survey (input from participants in the form of recommendations for the future)

Download Day 2 programme as PDF

Time mentioned corresponds to the Central European Time (CET)


‘Law as code’ and interoperability


Welcome & instructions



Knowledge representation and semantically enriched information extraction for interoperable legal technologies

Keynote – Serena Villata

Researcher at the Centre National de la Recherche Scientifique

Track 1 – LAW AS CODE




ELI and ELI-DL standards: toward interoperability and smart reuse of the descriptions of legislations and legislative activities in the European Union, and anywhere else

John Dann (Central Legislative Service, Ministry of State, Luxembourg)

Initialized by the forum of official journals in 2011, and supported by the European Commission, the ELI (European Legislation Identifier) project aims to provide for the legislation of the countries of the European Union and all others states that wish, permanent identifiers for legislative texts and a standardized data model (ontology) for describing legislative texts. The ELI standard has been adopted today by the official journals of 16 countries and the EU Publications Office in the form of identifiers which allow direct access to legislative texts on the web, and standardized metadata in the web pages of legislative texts text. The ELI standard is based on semantic web technologies, which makes it possible to extract metadata to build a knowledge graph of the legislation of each European country and of the European Union. In 2019, the ELI task force designed an additional standard; ELI for Draft Legislation (ELI-DL) to allow a standardized identification and description of the legislative work of governments and parliaments, thus promoting greater transparency of legislative activity in Europe, and the reuse of homogeneous data by legal publishers and citizens. The ELI task force, thanks to funding under the ISA2 project, carried out important training work for the official journal teams and their IT contractors. These numerous workshops all over Europe allowed them to be trained in semantic web standards, permanent identifiers, ontologies, controlled vocabularies and interoperability constraints. With ELI, Europe participates in a worldwide effort to build more transparency, efficiency and interoperability.



Building, enhancing, and integrating taxonomies – Part 1 

Heather Hedden & Helmut Nagy (Semantic Web Company)

Knowledge organization systems, and specifically controlled vocabularies (including taxonomies, thesauri, term lists, ontologies), play a key role in making content and data easier to find, whether within an organization or published externally. Knowledge organization systems are utilized in many ways in information management and retrieval: topic browsing, search support, discovery, filtering results, sorting results, curated content and alerts, content management workflows, data analysis, recommendations, etc. Many organizations already have various controlled vocabularies designed for different purposes but lack an enterprise-wide taxonomy, have taxonomies that are out of date, or have taxonomies that are under-utilized. This workshop focuses on several basic methods to get started in either building a new taxonomy or in combining and enhancing existing controlled vocabularies. The workshop’s first section is an introduction to knowledge organization systems with a focus on the suitability of different types for different purposes: term lists, name authorities, classification systems, hierarchical and faceted taxonomies, thesauri, and ontologies. Standards (especially ISO 25964 and SKOS) will briefly be mentioned. Participants will be quizzed on which kind of knowledge organization system is suitable for different kinds of situations and for different kinds of concepts (subjects, names, etc.). The next section, which is the focus of the workshop, addresses different methods for coming up with concepts in a knowledge organization system, whether as a new KOS or to further build out and enhance an existing one. Because knowledge organization systems connect users to content, they need to be designed to take into consideration the users' needs and inputs and take into consideration the specific corpus of content. So, we will consider both users and content as sources for taxonomy concepts and their labels. Methods of obtaining inputs for concepts from users include brainstorming workshops, stakeholder and sample user interviews, search log analysis, and card sorting exercises. A brainstorming session is more suitable for participants from the same organization, so our workshop’s main interactive activity will be a collaborative card sorting exercise through an online tool. Methods of deriving concepts from content include a manual content audit, identifying concepts as a trained indexer would do, and automated term extraction. We will have one interactive exercise of manual content analysis for concept identification and then present a demo of a tool for term extraction from a large corpus of documents. The final section of the workshop will briefly consider the issue of combining existing knowledge organization systems in overlapping subject areas. They can either be merged into one, if there is no business need to keep them distinct, or they can be kept separate and linked at the individual concept level. If there is considerable overlap, they may be “mapped”, which means the concepts are designated as equivalent or nearly equivalent, and one controlled vocabulary may be used for the other, such as one at the tagged back end and one at the retrieval front end. Sometimes controlled vocabularies, which do not have equivalent concepts but rather related concepts, are linked with other related-type relationships. Participants will learn when each method of merging, mapping, and linking are suitable and the methods involved. This workshop is suitable for both those who are building new knowledge organization systems and those who need to integrate or enhance existing knowledge organization systems.


Is an upper level ontology useful?

Peter Winstanley (Semantic Arts)

This will be a talk about developing ontologies for beginners and will focus on the benefits of using an upper level ontology. The Semantic Arts "gist" upper-level enterprise ontology will be used as an example


ELI use case (Spain)

Susana Gómez (State Official Journal (BOE))


Semantic technologies for public administration data: some use cases in Italian PAs

Aldo Gangemi (University of Bologna and Institute for Cognitive Sciences and Technologies, Italian National Research Council)

The contribution is aimed at the main topic "Semantic interoperability, data exchange and sharing", while adopting methods for "Knowledge and language management" providing sufficient "Quality of data". The talk intends to present some projects that apply semantic web, knowledge graph, and automated knowledge extraction methods to public administrations. Some projects will be briefly presented: OntoPiA (a network of ontologies for Italian PA interoperability), ArCo (an ontology network that shapes the data of one million objects from Italian Cultural Heritage), both in collaboration with Italian national agencies, and Framester (a large factual-linguistic graph), a research project to foster the interoperability of data bearing a heterogeneous, lightweight semantics. Methods include advanced design to create formal and well-founded knowledge graphs (in the form of linked data), state-of-the-art methods for knowledge graph extraction from text, and automated linking of extracted graphs to existing ones. A frame semantics foundation is suggested as a flexible, cognitively-founded approach to semantic interoperability.


Legal knowledge representation and reasoning in a linked open data framework

Enrico Francesconi (European Parliament)

Machine readable, actionable rules represent a precondition for developing systems endowed with automatic reasoning facilities for advanced information services in the legal domain. In this talk we present an approach to legal knowledge representation and reasoning within a Linked Open Data framework. It is based on the distinction between the concepts of provisions and norms, able to provide reasoning facilities for advanced legal information retrieval (like implementing Hohfeldian reasoning) and legal compliance checking for deontic notions. It is also shown how the approach can handle norm defeasibility. Such methodology is implemented through decidable fragments of OWL 2, while legal reasoning is implemented through available decidable reasoners.


Towards standardized ontologies-driven interoperability for industrial data value chain

Hedi Karray (Toulouse INP – ENIT)

In recent years there has been a number of promising technical and institutional developments regarding the use of ontologies in industry. At the same time, however, most industrial ontology development work remains within the realm of academic research and is without significant uptake in commercial applications. The talk will highlight the role of ontologies' capabilities to empower trustful data sharing and valorization in the industry and overcome the existing interoperability bottlenecks. The talk will introduce an International and a European initiatives, the Industrial Ontologies Foundry and OntoCommons H2020 CSA project, aiming to foster data driven innovation through an open ontology ecosystem, providing harmonized semantic facilities to describe data in several industrial domains, together with tested tools and methodologies for data documentation in practice, facilitating compliancy to FAIR principles. These two collaborative initiatives lay the foundation for interoperable and standardized data documentation across industrial domains, thereby facilitating data sharing and pushing data-driven innovation to bring out a truly Digital Single Market and new business models for European industry to meet the opportunities of digitalization and sustainability challenges.


Enabling semantic interoperability in legal drafting

Mikael af Hällström (Finnish Tax Administration)

The Interoperability Framework - a working solution ready for implementation Imagine you could draft legislation and publish the data model and even the application programming interfaces for the IT-based implementation of the proposed rule of law at the same time? What we are experimenting with in the Finnish public administration is a toolset providing the necessary capabilities to create a seamless link between the regulatory ideas of the legal drafter and the API-developer trying to enable the data flows between systems that would support the application of the drafted rule of law. We started with the obvious - promoting interoperability as defined in the EIF (European Interoperability Framework) - by creating tools to enable one central part of all Data Governance Initiatives, namely the creation of common, reusable and shareable metadata about all data resources created, used and maintained by public sector agencies. With the Interoperability Platform, all government, regional and municipal actors can commonly create terminologies, data models and codelists that use the same terms for the same concepts - a prerequisite for the common understanding and cross-organisational use of disparate IT-systems (with their own inherent data models, some very badly documented). The foundation for the platform is the creation of metadata-artifacts that can be linked together according to the principles of Linked Data. Even global domain-specific ontologies can be linked to a specific class or attribute, enabling not only domestic but also international semantic interoperability. The Data Vocabularies -tool was then equipped with a state-of-the-art API-documentation supporting feature, namely the capability of exporting application profiles (data models) in a number of formats, including JSON-LD and OpenAPI 3.0. To an non-geek this is gibberish, but it basically makes it possible to link the business developed descriptive model of a dataset to a technical interface (API) enabling machines to access and understand the dataset in question. Finally, we then embarked on the hardest part of the journey, namely the linking of the original legal texts governing every action in the public sector to these data models (and API's) created for the implementation of the rules embedded in these texts (whether they'd be laws or other normative documents). The tricky part here is not so much the development of a user-friendly drafting environment with easy "search & link to a concept" -functionalities as trying to convince the legal drafters in the ministries of the benefits of this approach. Structured documents tend to be perceived as just a way of improving the content management and time-to-publish features of a document management initiative, not benefiting the legal drafting process in any other way. Luckily the Ministry of the Environment was bold enough to finance the pilot phase as part of its efforts to support the digitalization of the Built Environment -domain. The next steps on the path from beta-testing to production are still a bit foggy; with negotiations going on with the Prime Ministers Office, the Ministry of Justice, the Ministry of Finance and the Finnish Parliament we're still confident that the further development of the proposed model will continue and lead to huge benefits in the implementation of new and revised legal rules in public agencies IT-systems. Adding the idea of "Rules as Code" as presented by the governments of New Zealand and Australia would be, well, simply fantastic. Digitalization at its best, for sure.


Break & poster

Lynx: an ecosystem of smart cloud services to manage compliance across jurisdictions and languages

Elena Montiel & Patricia Martin Chozas (Universidad Politécnica de Madrid)

In this talk we present the main results of the Lynx project, an Innovation Action funded by the European Union under the Horizon 2020 Framework Program. In Lynx we have created an ecosystem of smart cloud services to help users comply with applicable legislation in Europe. This ecosystem of services relies on a knowledge graph of legal and regulatory data coming from various jurisdictions in several natural languages. The talk will provide an overview of the so-called Lynx Services Platform (LySP), the process performed to annotate and structure documents in the form of a Legal Knowledge Graph (LKG), and the three pilots built on top of the LKG by orchestrating Lynx services.

Legal Act Visualization as a Three-dimentional Tree

Ermo Täks & Helina Kruuk (Taltech)

Riigi Teataja (www.riigiteataja.ee) is a legal database of all Estonian legal acts, publishing only in digital starting from 2010 format. Legislator changes legal acts relatively often; the Estonian Penal Code is changed 92 times for example. Consolidated legal texts are signed digitally to maintain validity and authenticity over time, each change in texts are consolidated accordingly and published with changing legal act. This way each version of the legal act is connected to the previous and following version of the legal act. It is possible to compare two versions of legal acts, but such analysis is limited if there is more than two versions available. To assist users, an experimental visual model for legal act versioning is developed. The experimental web application allows the user to visualize consolidated texts of legal acts and changes made to them as a three-dimensional tree. The visual model reflects a legal act structure and indicates the location of changes over time. The website consists of three pages: a front page, a selected consolidated act page visualized in tree-like shape, and a view of legal act changes. Users can manipulate the model to get better views. Website is accessible from here: http://dijkstra.cs.ttu.ee/~hekruu/iaib/resources/index.html Such a visual layout enriches existing database with time related semantic references, revealing how legal thinking reflects time. It also keeps users up to date in ever-changing legal environment, expanding over all versions of specific legal text, helping them to pinpoint specific parts of legal text changed in the past or amendments not yet in force.


Break & solution room

LeReTo: Connecting Documents with Knowledge in Realtime

Veronika Haberler (LeReTo)

The Austrian legal tech start-up LeReTo was chosen by the EU Publications Office to develop a project working with EUR-Lex data and visualizing data in citation network maps for the EU-Datathon 2019. The Smartfiles Network was born. While we already had some key technology for the project up and running, it was the first time for us working with EUR-Lex data and visualizing data in citation network maps. We used our framework to extract information, enriching the plain PDFs from EUR-Lex and Austrian RIS-databases with source links to case law and calculate the IN- and OUT-Degrees of the case law decisions. The results consist of four main elements: 1. Landscape & Citation Network Map 2. Interactive Data Overview 3. Case Law Infobox 4. Check Out Options. The best way to support legal professionals is to improve the potential of the PDF and to assist them in their daily research workflows. This was done by enhancing their preferred file format and simultaneously reducing unnecessary manual work. We have created a solution for the recognition and linking of cited legal sources with European and national case law data, including a KI-search function. The outcome of the project, presented to the jury in Brussels, was a proof-of-concept and an open-beta-ready version. The project won the first prize. Since then, we were invited to present at the EU Publications Office as well as the European Court of Justice. https://smartfiles.lereto.at/


Artificial Intelligence for Knowledge Graphs

Dermot Doyle & Silvio Cardoso (Dynaccurate)

Dynaccurate AI is the product of seven years of dedicated research and engineering carried out by the Luxembourgish Institute of Science and Technology (LIST) into resolving the issue of how legacy database information can be easily, efficiently and accurately updated, so that older data can be continuously updated with new annotations, preserving the value of the data and reducing or eliminating inaccuracies or out-of-date references. In particular, in the Life Sciences domain, it is estimated that up to 50% of knowledge becomes out-of-date every 10 years, due to ongoing research. New terms for genetics are provided on an almost daily basis for example, and in parallel, KGs must incorporate these new terms if they are to be exact. However, it is not just the Life Sciences where terms evolve – we also require controlled vocabularies within international organisations such as the EU and UN (e.g. Eurovoc and Agrovoc). Over the course of 2020 the team at LIST interacted with various stakeholders across industry and the public sector to examine uses cases for Dynaccurate. We discovered that evolving terminologies were one of the major bottlenecks for life sciences organisations, hospitals and health agencies as well as public sector administration. However, being able to automatically update and remap knowledge graphs now opens up exciting possibilities in semantic interoperability, allowing for greater efficiency, accuracy and exploitation of data. Many manual tasks associated with KG maintenance can now be automated, and semantic interoperability can be achieved on an ongoing basis.


Break & solution room


Fernando Nubla Durango (European Commission)


Laurent Vinesse (European Commission)

A hub for emergency services

Rink W. Kruk (Nationaal Geografisch Instituut)

In the case of events, disasters and incidents, it is important that the emergency services of the various disciplines, i.e. hundreds of emergency and police zones are coordinated. When this is not the case, it is more difficult to - dispatch the right teams; - find and share the relevant data with the emergency services; - organise the flow of emergency services when they come together; - form a uniform picture of the incident or disaster; - organise and structure the emergency services in the event of evacuations, warnings, predictions of toxic gas clouds, etc. Currently, ad hoc solutions, such as maps of advertising leaflets, are sometimes used in the absence of better alternatives, which is not an ideal situation. In addition, all supplementary information from different digital sources has to be brought together, which takes too valuable time. The emergency services therefore see this lack of means of cooperation and uniform location based solutions as a pressing problem.


The visualization of complex legal knowledge as a must-needed Ariadne’s thread to navigate the labyrinth of the different legislative frameworks

Gabriele Di Matteo & Valérie Saintot (European Central Bank)

As a Knowledge Management lawyer, my daily work relies heavily on searching, capturing, sharing and concretely explaining complex legal concepts. Due to time constraints, it is often required to show in a compact and captivating way the main principles related to a specific legal issue. To do so, visualization techniques and tools have become fundamental in smoothing the operation of nesting and materializing textual or numerical data linked to different legal topic. The main aim of this light talk is to explain the reasons for new forms of visualization of the law, which make fully use of the new advancement in technology, as well as showing concrete examples of visualization of the law conducted at the ECB KM team.



Building, enhancing, and integrating taxonomies – Part 2 

Heather Hedden & Helmut Nagy (Semantic Web Company)




Lifecycling Knowledge Graphs with VocBench: Eliciting, Developing and Curating Information 

Armando Stellato, Andrea Turbati & Manuel Fiorelli (University of Rome Tor Vergata)

Initially developed by FAO in the context of the NeOn project as a collaborative environment for the development of the Agrovoc thesaurus, later generalized to a SKOS development platform in the context of a collaboration with the University of Rome Tor Vergata, VocBench has, in 2017, reached its third incarnation. VocBench 3 (or simply, VB3) is the new version of VocBench, funded by the European Commission ISA² programme, and with development managed by the Publications Office of the EU, under contract 10632 (Infeurope S.A.). In this ENDORSE workshop, we will introduce the platform to newcomers and interested users, present the latest forthcoming version of VocBench (9.0) and will guide attendees towards the various functionalities of the platform. Attendees will learn how to import data from spreadsheets, transform it into meaningful information modeled according to semantics standards and maintain it into a powerful and dynamic collaborative environment. A look ahead for new directions will conclude the talk.


A visual exploration of European Case Law: The Smartfiles Network

Veronika Haberler (LeReTo)

The Austrian legal tech start-up LeReTo was chosen by the EU Publications Office to develop a project working with EUR-Lex data and visualizing data in citation network maps for the EU-Datathon 2019. The Smartfiles Network was born. While we already had some key technology for the project up and running, it was the first time for us working with EUR-Lex data and visualizing data in citation network maps. We used our framework to extract information, enriching the plain PDFs from EUR-Lex and Austrian RIS-databases with source links to case law and calculate the IN- and OUT-Degrees of the case law decisions. The results consist of four main elements: 1. Landscape & Citation Network Map 2. Interactive Data Overview 3. Case Law Infobox 4. Check Out Options. The best way to support legal professionals is to improve the potential of the PDF and to assist them in their daily research workflows. This was done by enhancing their preferred file format and simultaneously reducing unnecessary manual work. We have created a solution for the recognition and linking of cited legal sources with European and national case law data, including a KI-search function. The outcome of the project, presented to the jury in Brussels, was a proof-of-concept and an open-beta-ready version. The project won the first prize. Since then, we were invited to present at the EU Publications Office as well as the European Court of Justice. https://smartfiles.lereto.at/


Court decisions: the real big challenge for legal big data – How the ECLI framework helps to improve the accessibility of huge case law databases

Marc van Opijnen (Publications Office of the Netherlands – UBR|KOOP)

Online publication of court decisions is necessary for the transparency of justice and to inform the public on the development of the law. Therefore, in many countries a selection of decisions is placed online, but the Open Data movement and the data hunger of legal tech applications puts an additional pressure on the judiciary to make more decisions available. But whatever the magnitude of the database, court decisions are still poorly accessible: they are not well-structured, don’t have unambiguous identification, lack metadata and computer readable links to other legal documents, are not classified by taxonomy terms and are scattered over many websites. The problem has a European dimension: since the national judge is the gatekeeper of the European legal order, national court decisions are of relevance for all Member States. To improve the accessibility of court decisions, in 2010 the Council of the EU introduced ECLI: the European Case Law Identifier. This framework offers not only an unequivocal, European identification code for all court decisions, but also a set of 17 metadata and a search engine that has indexed 12 million judgments. Although it is a voluntary standard, eighteen Member States and three European courts have implemented ECLI. In 2019 a revised version standard was published. It is more aligned with the semantic web, has an additional 24 metadata and defines an ‘eXtension Language’, enabling more granular references. This renewed framework offers many possibilities, but many challenges remain to be solved before all legal knowledge, contained in those many millions of court decisions, is available within the semantic web, and for the benefit of the end-user.


Modelling legislation as a graph in the open trust fabric’s prose object model

James Hazard (CommonAccord.org)




Panel discussion

Semantic interoperability in the legal domain: challenges, benefits and use cases


John Dann (Central Legislative Service, Ministry of State, Luxembourg)

Enrico Francesconi (European Parliament)

Anikó Gerencsér (Publications Office of the European Union)


Dr. Fotis Fitsilis (Hellenic Parliament)

Raf Buyle (Informatie Vlaanderen)

Veronique Volders (Informatie Vlaanderen)


 Jean Delahousse (Independent consultant)

Aldo Gangemi (University of Bologna)

Marc Küster (Publications Office of the European Union)

Víctor Rodríguez Doncel (Universidad Politécnica de Madrid)



How to leverage taxonomies for content findability and usability

Joyce van Aalten (Invenier) 

In this hands-on workshop you will learn how to make the most out of a taxonomy or thesaurus for improving findability. We will discuss a wide range of taxonomy functionalities that could be useful within (the search engine of) an information management system. One of them being query suggestions, which will help users to specify their search query. In the second part of the workshop you will work at your own case: how could a taxonomy or thesaurus work for your organization?



AI and business decisions - The business decision making with AI 

Agata Majchrowska (CODR.PL Research Lab, ORA Wrocław, WWSiS)

The idea of workshops on Business Process Management based on AI and machine learning is linked to a few different decision-making areas. Among them, we can point out data-oriented models versus processes-oriented ones. When considering both models, the business strategies should respect the acceptable level of standardisation in reference with the appropriate quality of output data to be received. Thus, one should work on improving the preparation of the adequate framework, creating relations and teams to discuss projects' goals and processes on data, meant to achieve the acceptable result of business decisions made while implementing the projects. Expectations on re-using publicly shared data and third parties' data, depending on the characteristics of input data and expected outputs might not be fulfilled. All presented models of implementing the solutions will exemplify how important is to underline the supplementary role of the AI-based tools and the general rules governing the use and re-use of data, including the set of different ways of gathering public datasets and the third parties' information. The workshops focus on working with the legal datasets and databases, gaining the data, improving the quality of possessed data, and the limitations generally stated by the law according to re-use them in some specific purposes. The main goal is to underline, how important in creating data-oriented processes based on subjectively framed algorithms is the value of adequate both input and output data quality and how to prevent the unsuspected loss of data quality or its context.


Closure & SliDo survey (input from participants in the form of recommendations for the future)


Closure & SliDo survey (input from participants in the form of recommendations for the future)


Closure & SliDo survey (input from participants in the form of recommendations for the future)

Download Day 3 programme as PDF

Time mentioned corresponds to the Central European Time (CET)


Procurement and interoperability


Welcome & instructions


Achieving simplicity in transacting:

The role of contracts in our society and how we can model the EU economy via contracts

Keynote – Sally Guyer & Luigi Telesca

Global Chief Executive Officer for the World Commerce & Contracting WordCC &

Co-Founder and CEO of TRAKTI and CEO of EXRADE Srl





Creation of a semantic base for European Public Procurement

Natalie Muric (Publications Office of the European Union)

A lot of work has been done in the area of public procurement interoperability, however, until now the semantic interoperability ensuring reuse of concepts across systems and the procurement lifecycle has been largely ignored. The Publications Office funded by the ISA2 programme has been tackling this problem by providing reference data in the field of public procurement and developing the eProcurement Ontology in close coordination with the implementation of eForms for the Supplement to the Official Journal (OJ/S). The availability of these artefacts promotes a common understanding of concepts used in procurement, reducing the need for extensive mapping, whilst facilitating the move to linked open data. The Ontology is created by stakeholder participation, which is open and free. Participation in the eProcurement Ontology Working Group and input on authority table requirements are encouraged to provide maximum experience in all areas of public procurement.


AGROVOC: managing a multilingual thesaurus through networked collaboration

Imma Subirats Coll & Kristin Kolshus (Food and Agriculture Organization of the United Nations)

AGROVOC is a multilingual thesaurus, coordinated by FAO, and maintained by an international community of experts and institutions active in the area of agriculture, fisheries, forestry and related domains. AGROVOC is available as an SKOS concept scheme, also published as a Linked Data set composed of +38,000 concepts available in up to 40 languages. By means of Linked Data, AGROVOC is aligned to other open datasets related to agriculture and the environment. This presentation will address multilingualism in thesauri and controlled vocabularies, especially how AGROVOC manages this. Internationalization is not only about translations, it is also about context and knowledge. The AGROVOC Editorial network comprises national and international institutions and expert networks, who curate a language or a technical topic (for example land governance, or aquatic sciences and fisheries). Working with expert communities is essential to keep AGROVOC as up to date as possible with current scientific terminology. In this regard, AGROVOC is now including specialized schemes, which is proving to be very interesting. Having the new AGROVOC Editorial guidelines for shared understanding and maintaining a strong community focus is also essential, as this work is done on a volunteer basis by participating organizations.


Kohesio: a knowledge platform on projects funded by Regional Policy, using a wikibase (semantic web and linked data open source tool)

Anne Thollard & Max De Wilde (European Commission)

Kohesio project aims at building a knowledge base on projects and beneficiaries funded by Regional policy (European Regional Development Fund - ERDF, Cohesion Fund - CF) and European Social Fund (ESF). Objectives are threefold: · Increase visibility and transparency on projects and beneficiaries mainly for the general public, · Re-use and link existing data thanks to the use of a Wikibase, open source and free tool developed by Wikidata, a German branch of the Wikimedia Foundation. Providing extensive open and linked datasets. · Create a community of practice with internal and external stakeholders, including European Commission communication partners in the EU member states.


The European vision for Public Procurement Data

Isabel da Rosa (European Commission)


Building a ‘common language’ for semantic interoperability

Elena Shulman (D.E.Solution)

This talk will describe the steps, collaborations and challenges involved in building a shared language (semantic interoperability) across public facing websites of the European Commission. First part of this undertaking involved the design of an information model and controlled vocabularies for describing and restructuring content from approximately 60 websites of various EC departments. The second part involved mapping heterogeneous legacy vocabularies to a common and centrally managed set of controlled vocabularies with the support of the Publications Office of the European Union. Lessons learned confirmed the necessity for establishing, early on, a central point of coordination and communication dedicated to promoting and enabling semantic interoperability, strong management buy-in, as well as vocabulary management tools and an institutional commitment to provide long-term vocabulary stewardship.


Crowdsourcing and citizen engagement for people-centric smart cities

Elena Simperl (King’s College London)

Smart cities are as much about the needs, expectations and values of the people they serve as they are about the underlying technology. In this talk, I am going to present several areas of system design where human and social factors play a critical role, from fostering participation to augmented intelligence and responsible innovation. I will present two Horizon 2020 programmes: Qrowd, which delivered a humane AI approach for transport and mobility using digital tools, crowdsourcing and citizen-led innovation; and ACTION, which runs an accelerator for participatory initiatives against pollution with 15 pilots in several European countries. In both contexts, I will present approaches to collect and curate open and reference datasets and comment on ongoing challenges, solutions and opportunities.


Towards a comprehensive EU data strategy for eProcurement

Roberto Reale (Eutopian)

The eProcurement Ontology, as developed by the Publications Office of the EU, is a foundational work for shaping the eProcurement landscape across the EU, especially so as key components (eForms, ESPD, eCertis, ...) are continuously improved and harmonised. Whilst the Ontology per se is a very comprehensive effort, there appears to be a missing link as far as new data architecture initiatives (notably the Data Strategy, the European Blockchain Services Infrastructure and GAIA-X) are concerned. Liasing with such initiatives, at the governance, semantic and data architecture levels, is of the utmost importance to create a rich and effective data ecosystem. We will examine governance and technical aspects of such liasing, with a special focus on the European Blockchain Services Infrastructure.


Using a Covid taxonomy to improve the findability of e-learning sessions for health professionals

Maura Moran (Mekon Ltd.)

Health Education England is a body in the Department of Health with a responsibility to provide online education and training for the health and public health workforce. It currently provides over 400 e-learning programmes to 1.5 million registered users.
Feedback indicated that it was difficult to find useful search results on HEE’s e-learning website. This is partly because of the complexity of the health domain itself, where content could be described with many synonyms and terms from many facets. In addition, the content is sourced from a large number of providers in the government and the healthcare community, and this has presented a challenge in describing the content in a useful and consistent manner.
The Covid-19 pandemic created an urgent need to improve the site so that new information could be disseminated, and refresher training could be offered to staff being re-deployed.
This presentation will explain how HEE improved the search accuracy and navigation features by the use of taxonomy. The main focus is the development of a Coronavirus taxonomy to describe the e-learning content. This allowed us to make use of synonyms and hierarchical relationships to improve search accuracy. It also includes facets that healthcare workers need for search. In addition, we licensed additional taxonomies from other government bodies. Together with the new search tools, the taxonomies have made the search and navigation more accurate and useful.


You said interoperability? Lessons Lessons learned from two #AI #ML projects 

Valérie Saintot (European Central Bank)

A cultural shift across the board (make data sexy for non-data scientists and engage the executives and leaders to practice evidence-based decision-making and communication) - Show case with the Problem-Data-Facilitation dialogic framework how to create the conditions to achieve the above. PURPOSE: realise that data science is about data science and much more. VALUE PROPOSITION: to work with data requires a lot of thinking, or at least it should. Three different forms of thinking are clearly needed: analytical, creative and critical thinking. Each of these types of thinking corresponds to a different part of the encoding and decoding phases of the data analysis process. When we define the PROBLEM we are trying to solve, we need to mobilise analytical thinking. When searching the DATA to inform the decision-makers about the problem, we need to be thinking creatively to gain insights from the data. When we have harvested a useful dataset and insight, we need critical thinking for an evidenced-based FACILITATION of the necessary discussions. With the PDF framework (Problem-Data-Facilitation) we enable rich discussions to support decision-makers in their work and outreach communication.


Putting in place an eProcurement BDTI pilot

Cécile Guasch & Sander van Dooren (European Commission)


The FAO Caliper Platform

Caterina Caracciolo & Carola Fabi  (Food and Agriculture Organization of the United Nations)

In this talk I will present Caliper, a platform developed at FAO dedicated to improving the management and dissemination of statistical classifications. Caliper is inspired by the principles of open and linked data. While all the classifications and correspondences it disseminates are already in the public domain, it adds new possibilities to the ways they can be accessed and consumed. In Caliper, the layman can use its user-friendly, multilingual look-up services, the IT-savvy can access and reuse data in a variety of ways, while classifications’ custodians can have an integrated platform supporting both curation and publication of classifications. Since it relies on open source and free of charge tools, it talks to people and organizations interested in transparency and, possibly, with low budget.


Information transferring in the benefit of policy building

Sabrina Medeiros (InterAgency Institute)

The adaptive approach of public policies suggests that policy building is dependent on the various perceptions of efficiency actors (Walker, Rahman, and Cave 2001). Efficiency is mostly qualified by the original expectations that public policies will come from the arising demands, naturally and systematically organized by institutional processes. Both the expectations of the society and the electoral arena are conditioners of the demands that arise in terms of public policies. A great variety of theories was dedicated to explaining how public policies emerge and why institutional development is done (Wallner 2008, Walker, Rahman, and Cave 2001, Shipan and Volden 2008). Rational-choice theories point a great amount of rationality inside the process of institutional development (Simon 1957, Simon 1991, Simon et al. 1987), while institutionalists put their eyes on the various frameworks that eventually sub-judge interferences. Still within an institutional framework of theory, there is the possibility of observing and understanding institutional process and policy building taking into account the implications of the actors’ trajectories in their natural institutional developments. As Walker have stated, “most policies must be devised in spite of profound uncertainties about the future” (Walker, Rahman, and Cave 2001). At the same time, the governance schemes are built to observe the necessity of changes and institutional evolution, but less effort is dedicated to appropriate practices and experience than to the general view from the top side of institutions. Whereas developments are integrated naturally and actors along those institutional lines of communication may incorporate them, this appropriation is not visible. From the bottom side of the institutional processes, practices are mostly considered natural responses of the institutional development level and strides (Plessner, Betsch, and Betsch 2011, Chhotray and Stoker 2008, Bardach 1998). So, there are two main problems connected: guarantee more visibility of how policy innovation is manifested through institutional developments, and how to better achieve this appropriation from actors’ behaviour in a systematic manner, including semantics. Walker’s adaptive approach diagnostics that policymaking is different in three different aspects, once it is the option: the analytical approach; the types of policies considered, and the decision-making processes (Walker, Rahman, and Cave 2001). So, I propose simulations as methods of observing and analysing institutional change. In sequence, the proposal is to add to the simulation presentation, the schemes for observing and treating information from the various participants as to build policies oriented by fundamental concepts derived from the debates induced. That is why my proposal is both centred on a regular presentation and/or a dynamic eighter to be presented as innovative method and software (online) or to use it inside a workshop as to achieve conclusions for the events main themes.


Break & demonstration

Juremy Legal Term Search

Robin & Timea Palotai (Juremy.com)

Juremy is an EU Corpus-based legal and technical terminology search tool. It provides fast and authentic concordance search within the Eur-Lex database in all 24 official languages. Juremy is streamlined for exact phrase lookup, so matches are presented in the narrow bilingual paragraph context. First, we present how Juremy makes terminology search on the Eur-Lex database fast and customizable, with special regard to the requirements of the EU linguists' translation workflow. Then we present the advantages of our search tool by looking at some typical terminology examples. Finally, the special functionalities are presented. Since Juremy focuses on legal terminology research, it can provide specific, EU-translation focused functionality such as sorting, filters and additional metadata, which will be shown in detail during the demonstration.


Break & demonstration


Caterina Caracciolo (Food and Agriculture Organization of the United Nations)



Towards semantic interdisciplinarity by linking diverse scientific knowledge graphs

Panagiotis-Marios Filippidis, Charalampos Bratsas & Evangelos Chondrokostas (Aristotle University of Thessaloniki)

In this work we build a scientific Knowledge Graph (KG) consisting of generic and specific scientific taxonomies, by joining them through their related concepts. The purpose is to combine the scope of the generic taxonomies with the level of detail of the specific taxonomies, in order to have a KG that can be used in all scientific fields properly. The unified KG is used for semantic enhancement and search of science related content, enriching its annotation with multiple concepts of heterogeneous classifications. The join of the various taxonomies is accomplished by using ontologies matching applications such as the Silk Framework and the Alignment Tool and it is based on similarity metrics software suggestions and domain experts’ contribution and verification. A total of 3 500 links between generic and specific classifications have been created and imported into the KG, benefiting semantic annotation of science related content. To this end, we built a Semantic Engine (SE) to enhance numerous PhD offers with concepts of the semantic KG, analysing the offer text. These concepts are stored as semantic annotations for each offer, including their KG links and the offer is indexed with concepts of various classifications. Another implication is the tagging system of journals and the indexing and search of publications with related content, providing more concepts that correspond to the journal’s content facilitating its search through scientific publications repositories. Finally, graph analysis features provide valuable insights about the similarity and the relevance of diverse scientific fields allowing better results through statistical concepts modelling.



Diverse Uses of a Semantic Graph Database for Knowledge Organization and Research


Vladimir Alexiev (Ontotext Corp.)

Semantic Graph Databases are the foundation of Enterprise Knowledge Graphs. They are used in numerous industrial applications, but also Knowledge Organization Management systems ( thesaurus and ontology management systems), such as VocBench, SWC PoolParty, Synaptica Semaphore. Through VocBench, semantic databases manage or publish some of the most important thesauri: EuroVoc, AgroVoc, the Getty Vocabularies, etc. Semantic databases are also used in a wide variety of research domains and projects. Some have open source or free editions that make them an easy choice for academic research. We searched on Google Scholar and found 1000-1200 academic papers and theses mentioning one of the popular databases. We also found at least 50 books on Google Books that mention it. We started a Zotero bibliography on the topic (currently about 150 papers), and captured about 220 research topics, based on the titles of about 250 papers. We will present an analysis of reference data and research domains using a semantic database. Some of the traditional topics include: social media analytics, data marketplaces, business process management, enterprise data integration, statistical data, engineering, smart cities, sensor networks, life sciences, biomedical ontologies, medicines, chemistry, linguistic data, semantic publishing, semantic text analysis, geographic information, master data management. Newer or more exotic topics include academic/research data, COVID and Zika viruses, Quran and bilingual Arabic-English data, art history, Holocaust research, musical events and adaptations, iconography, food and drink, tourism, investment decision support, economic research, offshore leaks, maritime data, construction projects, building information management, crisis management, critical incidents and infrastructures, data journalism, clinical trials and specific medical topics (e.g. intestinal cells, intracoronal tooth restorations, vaccines, toxicology), investment recommendations, data journalism, etc.


Break & solution room

Common Mapping of Innovation supporting actors – How the interoperable mapping helps SMEs innovate?

Szabolcs Szekacs (European Commission)

There are several websites that support businesses in innovating by providing information on advanced technologies, testing facilities, financing, etc. in the area of innovative solutions. However, in practice, many of those websites provide only fragmented information rather than a comprehensive overview, implying authorities and businesses to visit many websites to obtain all needed data about innovation aspects. This fragmentation and lack of data interoperability represents burden and additional costs for the different interested stakeholders. The European Commission's CMISA project aims to address the aforementioned challenges and needs of the stakeholders and users by allowing the websites that collect information on innovation supporting actors to easily share and reuse each other’s datasets using common, interoperable tools.

Creating Semantic Reference Models for Spaceflight - An Ontology and Knowledge Graph Suite to support Astronautics

Robert Rovetto (Independent consultant)

This work is about creating semantic models of the discipline of astronautics. These domain models will capture the data, knowledge and semantics of the discipline in order to support safe spaceflight, improve space situational awareness, and help resolve the hazard of orbital debris for future generations. More specifically, I am developing a suite of ontologies to encode domain knowledge, and annotate data, for such things as artificial intelligence applications, knowledge representation and reasoning, data fusion, and search and retrieval. The project is open to sponsor, formal collaborations, and is currently described at http://ontospace.wordpress.com and https://purl.org/space-ontology.


Panel discussion

Data strategy for eProcurement & interoperability: future actions


Giorgia Lodi (Institute of Cognitive Sciences and Technologies of the Italian National Council of Research)


Patrizia Cannuli (Consip S.p.A)

Carmen Ciciriello (European Commission)

 Cécile Guasch (European Commission)

 Isabel da Rosa (European Commission)

Steve Graham (Open Peppol)



Cross-organisational collaborative activities around open-source semantic applications to better maintain, link, visualise and disseminate semantic assets

Caterina Caracciolo (Food and Agriculture Organization of the United Nations)

Luca Gramaglia (Eurostat)

Anthony Camilleri (European Commission)

Anikó Gerencsér (Publications Office of the European Union)

Denis Dechandon (Publications Office of the European Union)

This workshop will be run by representatives of the Food and Agriculture Organization of the United Nations, the statistical office of the European Union (Eurostat) and the Publications Office of the European Union. It will mainly focus on the maintenance, alignment and dissemination of 3 major assets: the Statistical Classification of Products by Activity (CPA), the Central Product Classification (CPC) and EuroVoc, the multilingual, multidisciplinary thesaurus covering the activities of the EU. Furthermore, it will demonstrate that open-source semantic technologies and online accessible collaborative platforms play a crucial role in and for developing the semantic interoperability and supporting the sharing and implementation of international standards, as well as further strengthening the dissemination of semantic assets and supporting the interlinking of data.


Prêt-à-LLOD: Making Linguistic Data Ready-to-use with Linked Data


John McCrae (National University of Ireland Galway)

The Prêt-à-LLOD project is building a data value chain that makes linguistic data ready-to-use in a variety of applications, and in this talk we will provide an overview of the toolkit that the project is developing for this data value chain. There are six main steps to this chain: data discovery, which is being supported by a new data portal for linguistic data, Linghub, that combines different sources using state-of-the-art linked data technologies. Secondly, we are concerned with the preparation of data, including converting it into a format that enables further analysis, for which we are developing an innovative tool called FINTAN. The next step consists of organization and our focus here is on the legal aspects of data reuse especially around licensing. For this we are looking at the use of blockchain technologies such as IPFS to ensure that information about resources can be shared and combined through open standards such as ODRL. The next step is the integration of data where we are developing a new tool called Naisc to semi-automatically integrate datasets. Then, we must actually develop our analysis using Docker and linked data combined in a new workflow management platform called Teanga. Finally, these tools will lead to real-world actions especially relative to under-resourced languages through Europe and the World.


An Opportunity to Unite us in Terminology

Stine Jensen (DALeXI)

Knowledge management and terminology goes hand in hand as terminology is the vocabulary of professional knowledge covering a given knowledge field; both are ordered and categorised through ontologies and meta data, respectively. To give an example, EuroVoc is used as a terminology classification system for IATE and the Eurotermbank. Knowledge sharing of the scale that ISA2 and the Publication Office offer – in terms of support and guidance on best practices drawn from experience with ISA2 – is an extraordinary opportunity to perform large scale terminology management which would involve not only sharing but revising the terminology as all the field work and thus first step of a collaborative terminology management (involving users, experts, creators, legislators, aligning material, etc.) has already been taken. Such terminology management would create one source of reference that could serve member states with a weak or no public terminology management and most likely also attract private sector engagement. This would only enrich a resource of this kind due to the interconnection of the private and public sector, encouraging that way also cross-border and cross-sector interoperability. Furthermore, adding a promotional aspect and contribution motivation that might end up easing silos. One of which could be that updated terminology in an AI perspective plays an essential role in areas of interest, such as machine learning, digitalisation, medical report writing, chatbot technology, machine translation, etc. ISA2 presents a chance of preparing terminology for when the (language) technology is ready to integrate it!


Endorse 5 star deployment scheme for Linked Data

Ivo Velitchkov (KVISTGAARD)

The promise of Linked Data was to complement the web of documents with the web of data so that humans and machines can use the Internet as if it is a single database, while enjoying the benefits of decentralisation. Today, 15 years later, the web of data grew to 1 470 Linked Open Data publications. Yet, that is just 0.005% of all publicly known datasets. Apart from LOD, there is Linked Enterprise Data that can unite the heterogeneous corporate data sources into a single enterprise knowledge graph and in doing so achieve data integration at a small fraction of the cost of current data integration projects and bring additional value due to its low cost of change. Yet, corporations keep wasting IT investments in creating data silos and applying application-centric approaches to data integration. So why is it that Linked Data, both open and restricted, is still marginal? The main reasons are that Linked Data is still not attractive for big software vendors and for developers; and it is still perceived as too academic. But there are a few other, less obvious reasons, hiding where nobody would look for them: the successful LOD practices. This talk will examine this finding, recommend some remedies and suggest to complement the 5-star open data deployment scheme with 5-star Linked Data deployment scheme to distinguish the different impact of the way Linked Data is implemented and used.







Search of Knowledge (Tables of Contents and Indexes)

Meral Alakuş (Dahousie University)

Looking into the future and realizing the fast accumulation of knowledge specially during the last century and to these days, it is obvious that controlling knowledge will be the most crucial problem people will have to deal with in the coming years. Coping with the information explosion will be the responsibility of “Knowledge and/or Information Managers,” e.g. librarians and indexers, subject specialists and researchers as well as computer scientists. Organizing knowledge will be carried through analytic study of content matter, topics and subtopics will be identified with keywords; personal and other names will also be given just as they are recorded in the text, indicating relationships between them and showing page / location numbers. All this is done with the help of human intellect formulating and rearranging content matter under “Tables of Contents and Index”. Same rules are applied to books, journals, reports and other recorded materials, print or virtual. It is rightly claimed by the British Council Information Centre that “for the organization of information and of documents, the contribution of technology in comparison to human intellect, is 85% to 15% in favour of intellect.” In this paper, in search of knowledge, I endeavour to explore the uses of two traditional tools made available in the publishing world: namely the “Book Index” and “Table of Contents. It certainly seems that indexing will never disappear as long as knowledge in various formats grows and expands; in application, it might require different ways to approach the full content, which is in fact content analysis and defining topics with keywords. With the continuing increases in computer processing and storage capabilities, the barriers to and benefits of electronic access to more information content are becoming serious issues in information science research.


Interoperability of Semantically-Enabled Web Services on the WoT: Challenges and Prospects

Mahdi Bennara (École des Mines de Saint-Étienne)

The advent of the Web of Things as an application layer for the Internet of Things has led to the proliferation of Web services exposing data and functionality of the networked objects. Since the resource-oriented paradigm aligns well with the WoT architectures, RESTful services have been the go-to interface to expose the connected devices on the Web in a lightweight resource-oriented manner. However, the heterogeneity of descriptions of devices and services as well as the data-formats they exchange has led to a number of interoperability issues. Recently, the growing popularity of semantically-enabled services has led to the emergence of services described with and exchanging RDF. RDF can be seen as a universal abstract data model, or as a lingua franca for the data formats of the Web. By leveraging the power of REST and RDF and combining them with the Thing-oriented nature of the WoT, we believe that true semantic interoperability on the Web of Things can be achieved. In this work, we attempt to frame the challenges encountered in enabling semantic interoperability of heterogeneous WoT services, with the help of a real world production line scenario. We also propose preliminary solutions to the main issues hampering the establishment of true semantic interoperability on the WoT.


A Web-Based Operating System for Integrating Heterogeneous Data and Code Using Semantic Web Technologies as Interlingua

Sergejs Kozlovičs (Institute of Mathematics and Computer Science, University of Latvia)

The purpose of the lightning talk is to discuss the idea of the web operating system which would unify the web at both the data and computational levels. Instead of imposing a particular standard or specific requirements, our idea is to allow individuals, government, and private companies to publish their data using formats of their choice. Similarly, we allow them to provide access to their code using technologies of their choice. The proposed web operating system will provide a means to integrate these data and code by utilizing the OS driver metaphor. Besides, it will raise the level of abstraction by factoring out network-specific aspects, including security and access management. During the talk, we will demonstrate an initial working version of the proposed OS prototype, currently known as webAppOS (an open-source project, licensed under EUPL). We propose to use semantic web technologies (RDF, OWL, and SPARQL) as an interlingua for integrating heterogeneous data and code. Drivers will provide on-the-fly access to concrete data formats and storages (including cloud drives) and implement different code invocation methods. Besides, we will join heterogeneous data and distributed code by introducing the innovative web memory and web calls concepts. The proposed OS will draw us closer to the idea of a web-based computational infrastructure of migrating objects, offered by Alan Kay. That would facilitate cooperation between individuals, government, and private companies and boost interoperability between their services.


Challenges to promote Interoperability in a bicameral Parliament: the example of Cortes Generales

Marina Cueto (Senate of Spain)

I would like to speak about of the first steps reached by the Spanish Senate towards interoperability through using semantic assets and drafting work. The first conclusion of my presentation is that a collaboration with the other chamber of the Spanish Parliament, the Congress of Deputies, is necessary: both Chambers began together in the 80’s in the beginning of the Eurovoc thesaurus; it is important to promote a better understanding of parliamentary work, in the digital world, in which both chambers must speak thorough unique semantic; and because the necessary collaboration with Government (Ministry of the Presidency, Relations with the Parliament and Democratic Memory) in the making of legal documents, traditionally on paper and sendings, today must be reached with draft techniques where LEOS seems to be the tool to share and to amend documents.



Semantic data for better operations

Lambert Hogenhout (United Nations Secretariat)

The Emerging Tech Lab at the UN Secretariat is building a semantic layer to connect the overlapping areas of our different departments and SDG-related programmes with the aim to facilitate research, strengthen IT systems and foster collaboration among staff. By using existing taxonomies and linked data we established a basic knowledge graph (for various SDG-related topics and some specific topics such as humanitarian affairs). We have enriched this knowledge graph with automated methods (NLP and Machine Learning). We have developed various algorithms to leverage this graph for particular purposes, for instance to determine document similarity or assess the alignment of documents (such as project proposals) to strategic frameworks. The resulting graph will be used (and is already being used) in various ways: (a) to give human users a better insight in the relation between topics, including though immersive experiences such as augmented reality (we have implemented this on smartphones and are working on a version for headsets), (b) to underpin IT applications with a layer of knowledge, e.g. our conversational AI platform (chatbot) may use this layer in the future to improve its natural language understanding (NLU) and our Search Engine could use this to allow interactive discovery as an alternative to search results (c) allow UN staff to more easily find peers in the organization working on issues or possessing expertise that is relevant to their work.


Using Wikibase to connect & collaborate around structured data

Sam Alípio (Wikimedia)

Wikibase, the free and open source software behind Wikidata, is being increasingly adopted by organizations and researchers as a platform for creating knowledge bases of linked (open) data. Through a selection of case studies, learn how projects around the world are using Wikibase to power structured data repositories that break down knowledge silos and enable cross-disciplinary collaboration.


Europeana, the reference of European digital cultural heritage

Fulgencio Sanmartín & Katerina Moutogianni (European Commission)

Europeana is the Commission’s flagship in the wider policy area of digitisation, online accessibility and digital preservation of European cultural heritage. It is currently funded as a Digital Service Infrastructure under the Connecting Europe Facility programme. Europeana incorporates and works with thousands of European archives, libraries and museums to share cultural heritage for enjoyment, education and research. There are currently over 50 million records in its repository (images, publications, music, artworks and more), and 2.8 billion linked open data about them. The Europeana Data Model, the Europeana Publishing Framework (content and metadata components) and the International Rights Statements are key components for interoperability, sharing and re-use of cultural heritage data in Europeana. A future Common Data Space for Cultural Heritage will build on this initiative.






Lucy Walhain & Anikó Gerencsér (Publications Office of the European Union)

PMKI platform

Najeh Hajlaoui & Denis Dechandon (Publications Office of the European Union)

Semantic MediaWiki

Bernhard Krabina (KDZ - Zentrum für Verwaltungsforschung)


Wikidata as a central hub in the linked open data web

Lydia PINTSCHER (Wikimedia)

Wikidata, Wikipedia’s sister project, is a large knowledge graph containing general purpose data about the world, collected and maintained by thousands of people. Wikidata’s data is used in a wide variety of applications from digital personal assistants to data visualisations to making Wikipedia better. But it has another super power. It connects more than 5 000 catalogues, databases, social networking sites and more through identifiers, and has thereby become the biggest identifier hub on the web today. This talk will dive into what this means and why it matters for you.


Closure & SliDo survey (input from participants in the form of recommendations for the future)


Closure & SliDo survey (input from participants in the form of recommendations for the future)


Closure & SliDo survey (input from participants in the form of recommendations for the future)

Download Day 4 programme as PDF

Time mentioned corresponds to the Central European Time (CET)


Linking data spaces


Welcome & instructions


Linked data and EU digital sovereignty

Keynote – Fabien Gandon

Senior Researcher and Research Director at Inria


The EU data strategy and its approach on the interoperability within and across the data spaces

Daniele Rizzi (European Commission)


Advancing interoperability of Digital Public Services for Europe in the GAIA-X ecosystem

Christoph Lange-Bever (Fraunhofer Institute for Applied Information Technology FIT, RWTH Aachen University)


SDGR as a stepping stone for a public sector dataspace - the case of cross-border exchange of educational credentials

Sebastian Sklarß (init) & Peter Hassenbach (BMBF, German Ministry of Education and Research)




Round table

Connecting research and public sector data through FAIR principles


Joke Meeus (Research Foundation Flanders)

Raf Buyle (Flemish government)

The linking and re-using of data has been recognized as a common goal across different data communities that work on different topics (data spaces) in different sectors (public sector, industry and research). Nevertheless, it has been difficult to practically align different approaches towards handling data. This round table aims to concretize how different data communities can cooperate to apply FAIR principles. We will start with concrete use cases in which cooperation between different data communities has been successful. Speakers will reflect on their approach to FAIR practices, its benefits and remaining challenges. A panel composed of representatives of research, industry and public sector will elaborate on these findings to induce policy priorities and concrete actions for sharing knowledge, expertise and technology. (Results of the round table will be documented in a position paper).


Stefan Lefever (Imec)

Carlos Parra-Calderón (Institute of Biomedicine of Seville, FAIR4Health)

George Yannis (National Technical University of Athens)


Sarah Jones (Géant)

Sadia Vancauwenbergh (UHasselt, euroCRIS)

Robert Krimmer (University of Tartu, Estonia and Tallinn University of Technology)


Eva Mendez (Universidad Carlos III de Madrid)

Sören Auer (TIB)


Closing speeches

Franck Noël (Publications Office of the European Union)

& Natalia Aristimuño Perez (Directorate-General for Informatics)