Publications Office of the EU
ENDORSE 2025 conference documents - Endorse 2025
DisplayCustomHeader
ENDORSE header 2025 generic
Aplikacje zagnieżdżone

Conference documents

You can find the presentations and recorded videos of the sessions in the programme.

Follow-up Q&As title

Follow-up questions

aside - links
Follow-up Q&As description
Due to time limitation, some questions of the live audience and from Slido could not be answered during the Q&A sessions. The presenters' detailed answers can be found below.
Wydawca treści

As we've heard earlier this morning: reaching agreement in the first place is hard. What role does your organization play herein? How do your tools help with that?

We have different governance frameworks in place to define the collaboration with EU institutions. The Publications Office has the role of coordinating and communicating with the stakeholders. As an example, we collect all modification proposals to an interinstitutional vocabulary, we share it with the institutions, collect the feedback via written procedure or at meetings and after the reached agreement we implement the modifications and finally publish the updated vocabulary. Currently we are using standard tools to support these procedures such as SharePoint, Teams, Confluence or emails. VocBench supports the collaborative editing and validation workflow, and we use it also to collect feedback in one specific case for EuroVoc. In this use case our editorial team adds the proposals for new concepts to VocBench and the subgroup of the governance committee of EuroVoc, responsible for the content improvement, oversees and comments them directly in VocBench. After this step the final agreement is discussed at dedicated meetings.

What is the reliability of alignments generated by AI?

The reliability of alignments generated by AI is generally good but not perfect. AI tools can quickly identify many correct matches between terms, especially when the wording is similar, but they can still make mistakes when concepts are complex or depend on context. These alignments should therefore be seen as useful suggestions that save time and effort, not as final results. Human review and validation are still essential to ensure accuracy and trust in the final mappings.

Now there is still human in the loop. But for how long? Is this still needed when AI model is more trained?

For now, having humans in the loop is still important to make sure alignments are accurate and meaningful. As AI models become more advanced and better trained on domain-specific data, the need for human intervention will decrease, but it won’t disappear completely. Experts will still be needed to check sensitive or ambiguous cases and to ensure that the final alignments follow agreed standards and interpretations.

What is the solution you use for tag documents with semantic artifacts, in the document itself or the hosting platform? How ensure the docs keep their semantics when moved around?

We usually tag documents with semantic information directly in the hosting platform rather than inside the document itself. This allows the tags to be managed, updated, and linked to official vocabularies without changing the document content. To make sure the semantics are not lost when documents are shared or moved, we keep the links to stable identifiers (persistent URIs) that point to the official vocabularies. In this way, even if a document changes location, its meaning and connections to the reference data remain intact.

Do you create your semantic assets from scratch, or can a Member State offer their assets for to be used around all EU?

We have different types of semantic assets:

  • created by the Publications Office from scratch;
  • created by other EU institutions and agencies, either by scratch or as an export of existing vocabularies, and published by the Publications Office.

Our role is to support EU institutions in the creation, maintenance and publication of semantic assets. Member States can benefit from the open source and free solutions and platforms such as ShowVoc and Cellar to retrieve the data and VocBench which they can install on their premises. Additionally, the datasets of Member States are harvested by data.europa.eu, the central portal for European data.

How you evaluate the quality of the sematic tagging and alignment after having trained your algorithms?

We evaluate the quality of semantic tagging and alignment by comparing the AI results with a set of manually validated examples, often called a “golden dataset.” This helps us see how many of the AI’s suggestions are correct and where it still makes mistakes. We also review cases where the AI disagrees with human experts to understand why. Over time, these checks help us fine-tune the algorithms and improve their accuracy, ensuring that the tags and alignments become more consistent and trustworthy.

When comparing the ATTO and Authority tables, did you encounter challenges that you were not anticipating? If so, which one(s)?

Yes, especially certain tables were challenging due to semantic discrepancies.

For example, to compare our Corporate authors table (PUB_CORP) with the AT (Corporate body). There was a semantic difference. The ATTO Table, respecting the cataloguing rule that a corporate body is established under ‘the name by which it is commonly identified’, contained more entities than the AT Corporate Body, where corporate bodies are established as entities with varying labels over time. With the help of the Vocabularies Team, mappings from ATTO to AT Historic labels for Corporate Bodies were established. This enables the extraction of bibliographic data using either the historic label (valid at the time of the publication) or the generic label (Corporate body as an entity in linked data universe).

Another example has been the definition of semantics for table AT Resource Type and AT Product Form. The conclusion of a one-year analysis has been that the AT Resource Type is mostly related to content, while AT Product Form to format. This had as an impact the revision of many values and their definitions, as well as the deprecation of some values and the creation of new tables, i.e., AT Carrier & AT Type of binding.

Authority Tables are not used by the cataloguing team or librarians only. There are other users, and their need had to be respected too. For this reason, we find useful the experience of thinking outside of the box and including other potential clientele, who might want to use our ATs in future. Very often we needed to adapt our wording to the standard, but with the help of the Vocabularies team we managed well.

Not all datasets have developed a SHACL shape validation, but all of them have a need for automatic documentation/UML-diagram generation. Could your approach be applied, with due workarounds, to dataset missing SHACL shapes?

Yes. SHACL Play contains a feature to automatically derive the SHACL structure from a dataset (see https://shacl-play.sparna.fr/play/generate#documentation). The documentation can then be produced from this auto-generated SHACL profile.

How you take care of governance, if you work with SHACL on surface level? I think OWL should describe the WHAT and SHACL describes the restrictions, used for validation, not so much as primer description. (S = shape, like shadow)

I agree with « OWL should describe the WHAT and SHACL describes the restrictions ». OWL ontologies still exist and declare classes and properties ; the diffence with the pre-SHACL era is that the OWL contains much less restrictions than before (typically, very few domain or ranges declarations). This makes the semantics of the ontology more reusable across contexts.

SHACL is not « at the surface level », it simply applies to a different scope : OWL ontologies are scoped by the knowledge domain on which they apply, but do not care about how the datasets and data flows are implemented in real systems. SHACL specifications apply to a precise dataset, and will encode the precise structure of such a dataset, be them at the data acquisition step, or at the data dissemination step.

VC spec uses JSON-LD but the schema is described in JSON Schema (as far as I understand). Is there any future plan to support more LOD-native schema instead?

As far as I know, the Working Group has no plan in this direction. But nothing prevents users to use SHACL or other RDF-based shape/schema language to validate their credentials. This was done for example in the CIRPASS project (https://cirpassproject.eu/).

Could these identity standards use existing authorities, like ORCID (Open Researcher and Contributor ID), VIAF (Virtual International Authority File) or ISNI (International Standard Name Identifier) in any way?

In general, yes it is possible. One could define an `orcid` DID method (resp. `viaf` or `isni`) and describe what the DID document for these DID methods would contain. In those cases, the “Verifiable Data Registry” would be the respective services. Some people in the DID community may consider this as an abuse of the DID technology, because each of these registries is in fact centralized. Others would counterargue that decentralization can come from the choice of multiple registries, not necessarily from all individual registries being decentralized. I personally sympathize with the latter argument.

How does the LLM orchestrator basically works? Perhaps in an agentic fashion?

The orchestrator receives the chat history, user question, and tool descriptions. Based on this information, it creates a plan, often involving tool calls, which is subsequently executed to answer the user's query.

Are there any drawbacks you have encountered using this system?

Like other LLM tools, this one may yield inconsistent answers. We are improving reliability by “hardcoding” parts of the process and adding domain knowledge to reduce hallucinations, but some variability remains.

How do you know what portion of the response comes from the underlying LLM, instead of the data reached via MCP?

A: By displaying each tool call's details in the user interface, it allows us to monitor the LLM access to the knowledge base and identify possible hallucinations efficiently.

What is the rate of consistency running the same question/prompt?

We don’t have a clear metric on this aspect. As mentioned above, this remains a very common issue, but we are improving reliability by “hardcoding” parts of the process and adding domain knowledge.

Can I try the modelling assistant myself, locally?

We can provide temporary access to the tool upon request. The decision to publish the code on GitHub is currently under discussion at SEMIC, as we aim to eliminate dependencies on proprietary software or platforms prior to making it publicly available.

For future cordis you could consider that organizations are obliged to provide a standard identifier like ROR or ISNI to identify themselves, that might help for reconciliation. Is this on the roadmap?

Yes, this is something we plan to have by linking with WIKIDATA, which contains ROR, while for the ISNI we can investigate relying on their linked data platform.

Question for Xueying Deng: Why do the Flanders need a different data space than Belgium, than Europe? Isn’t the biology of humans and legislation about the same?

Xueying Deng: Even though the human body and European health-data laws are common, the ecosystem of actors, data flows, standards, language, and governance in Flanders is different enough that a dedicated Flemish Health Data Space makes sense. It allows us to build trust, harmonise locally, deliver value regionally — and then plug into the bigger Belgian and European stage.

Question for Xueying Deng: Since March this year, EU has adopted a Regulation to build the European Health Data Space (ref: EHDS Regulation). The DCAT Health Application Profile which extends DCAT-AP with vocabularies like DPV, CSVW, or DQV is pilar to achieve interoperability for a health data landscape.

Xueying Deng: Thanks so much for the information — this is very good to know. The Flemish Health Data Space (FHDS) is designed to adhere to the key principles of the European Health Data Space (EHDS), such as data sovereignty, interoperability, secure exchange, and governed access. The EHDS indeed serves as a regulatory and conceptual framework for FHDS. For more details, please refer to our final report: Vlaamse Health Data Space project. Eindrapport.

What was considered to calculate the Average Score? What will be an acceptable value to achieve to consider is high quality transformation?

The score is derived from two components. First, an XML schema validation function returns 0 if the generated XML does not conform to the AKN schema. Second, we compute the ROUGE-L score to measure structural and content similarity between the generated AKN and the reference document. The final score is the average of all evaluation scores across the test set.

Can you elaborate on your "document splitter" step? Is the whole pipeline hosted open-source somewhere?

Document splitting currently relies on traditional techniques, primarily regular expressions, because the overall structure of legal documents remains consistent. However, we are also developing a new LLM-based splitter to improve flexibility and handle more complex or unstructured cases.

Did you test without optimization?

Yes, we tested the models without optimization. In all cases, the performance was consistently lower compared to the optimized versions. This indicates that LLMs struggle to reliably convert content to AKN format without applying the optimization steps.

How do you deal with new versions of AKN?

When the AKN standard evolves, we incorporate examples from the updated schema into our dataset, update our processing pipeline accordingly, then recompile and validate the system against the new specification. This maintains full compatibility and accurate interpretation as the structure evolves.

Since when do you use the framework you propose?

We use a such framework since February 2025, as we began this proof of concept. Q: If it works, what is the hindrance in using more powerful (i.e., commercial) LLMs instead of open-source ones? A: As we stated in our introduction, we opted for an open source and local solution, which served our purposes. There is no “hindrance” per say to use commercial tools, it just brings additional costs.

You mention GraphDB (I presumre Ontotext's GraphDB) as a vector database - how does that work? GraphDB is a triple store, not a vector database. Are you using something like chromaDB or pinecone in addition to GraphDB?

We specified we use PostGresQL with PGVector as vector database. GraphDB is in our project used as a second retrieval augmentation adviser (where we have stored all the TTL files).

What metadata for example was missing in the documents that the LLM found?

We tried with many of our usual production metadata, like persons’ names, organisations’ names, legislative terms, parliamentary committees, ...

What is needed to expand this approach to other type of documents?

The same approach can be followed with else documents. The more the source documents are structured, and the XML tagging is relevant, the best the results. The parsing can be enhanced by refining the semantic support of the chunks made from the XML structure. For longer documents, it may also be needed to consider only some part of them where the metadata are more likely to be, like cover pages, abstracts, headers, footers... These last two measures are likely to allow better timing for answers’ rendering.