All the links to the presentations and videos you can find directly in the programme.
aside - links
Is Data Ownership factored in to your research for individual(s) or collective(s)?
Data ownership is an important issue, but most of my work has looked at data available under an open license, owned by public authorities. Publishers did not publish any personal data from individual citizens as such.
Can the education of translators (e.g. on universities) keep up with the growing need of localisation in the EU, for example?
The growing need of EU localisation is likely going to be a mix of expanding translation programs and using new technologies. So educating translators has two layers to it: training up more translators, and also providing ongoing professional development for existing translators so they can work within new paradigms, focusing on high-value tasks. Being able to help set up and work within frameworks to process more content while retaining quality is a much-needed skill set.
Could you please detail more which benefits are provided by the EDCC KB, in terms of knowledge transfer or other?
The EDCC KB collects information that can be reused in other contextes. Information in the EDCC KB can be used for example to extract resources to answer to questions related to the EU. Get an overview of what citizens are interested and statistics about languages used, topics and sentiments.
Is the underlying ontology publicly available?
The ontology is published on the web, but probably you will not be allowed to access it from your network. You can download the file here.
Do you also manually edit triples sometimes? Can you show how it is queried?
The data is generally modified via interaction of the UI or by data send by the CRM that is connected to the system. It is possible to change the triples also via SPARQL INSERTs/DELETEs/UPDATEs: This is not done by end users but only by admins.
Can you please share the link to edcc kb?
The EDCC KB contains sensitive information. It is therefore accessible only by certain networks. If you wish to have access to it please contact Espelund Morten at the following address: Morten.Espelund@ec.europa.eu
Are private providers involved in the initiative?
The tender documents explicitly refer to innovation procurement, more specifically to Competitive Dialogue, in order to have several stages of negotiations/dialogue with private tenderers.
Is this extensible at EU level?
The platform, though designed for the Italian national health service, is extensible at EU level since the reference legal framework (GDPR, draft AI Act, and so on) is already at EU level.
Should there be an EU "Healthcare" Vocabulary to deal with differences in terminologies in laws (AI, GDPR, etc.) and healthcare standards, countries?
The project per se does not envisages developing an EU-wide vocabulary, however it does comply with the European Health Data Spaces which states that: An EHR system shall allow personal electronic health data to be shared between health professionals or other entities from the health system, and between health professionals and patient or health professional portals in a commonly used electronic interoperable format, which includes, inter-alia, dataset content, data structures, formats, vocabularies, taxonomies, exchange formats, standards, specifications, profiles for exchange and code lists, thus enabling system to system communication.
SPARQL is still quite complicated for some users. What about other ways to interact?
Thank you for the question. So far we only explored SPARQL as the interface between users and data but the question is very much on point. The Facade-X approach is based on the idea that a simple, unique meta-model can be used to query and manipulate many different data sources. This opens opportunities for designing advanced user interfaces that could support non-SPARQL users in designing the KG construction pipeline in a more agile way. This is one of the future directions for research.
Any experience on using AI to deal with this issues? or references to studies on this matter?
I have worked on using AI for verifying and remediating different types of data quality issues. I mentioned in the talk the accuracy of data. You can find examples of truth discovery algorithms usable in this context in this paper for instance. However, other types of data quality issues such as completeness and interoperability can be improved by AI approaches.
With "circular ownership of companies", does detection help with tracking fraud?
The identification and analysis of any complex funding structure helps detecting fraud. This particular pattern is illegal in many jurisdictions. Concluding that a fraud exists requires understanding why the pattern appears and what other reasons than fraud could cause it. Fraud detection will most likely come from a network of hints rather than a single anomaly. However, even in the case of a genuine data recording mistake, it is a legal responsibility to know and understand such anomalies in a client or commercial partner and therefore ask the question.
How is conformity different from Compliance?
Conformity refers to adherence to some rules or guidelines. It is typically understood to be a choice. Compliance refers to a mandatory obligation or requirement - such as that specified by a law. Another way to understand the difference is non-conformity does not mean there is something wrong, just that something 'differs' from a stated norm. Non-compliance instead means something is (probably) illegal as it doesn't fit or satisfy the legal requirements.
Do you see multilingualism as the most important element of the language space?
Multilingualism is an important aspect for our European values of diversity and equality. It is an essential angle of the technologies we want to develop and, therefore, an important aspect of the language data to be collected and shared through the Language Data Space. It is a differentiating factor in the language data collection race we are seeing globally and Europe should take benefit from it. That being said, a lot of language technologies are not necessary multilingual ( speech transcription, summarisation,…) and monolingual corpus are also important to be considered
Is the language data space focused both on machine translation and common multilingual vocabularies or you see them as separate directions?
Philippe Gelin: The Language Data Space will focus on language data. Monolingual, multilingual, text, speech or even images and videos corpus has all an important role to play in the development of language technologies.
Multilingualism is an important aspect for our European values of diversity and equality.
As multilingual corpus is a differentiating factor in the language data collection race we are seeing globally, Europe should take benefit from it.
What is the use-context of 'Linked Data Solution Ontology'? Where will it be available?
The Linked Data Solutions Ontology (LDSO) is used to describe Linked Data solutions (LDS). An LDS is defined as “A technical solution enabling the utilisation or management of Linked Data, with at least one of its capabilities.” Currently, the LDS are described using EU Knowledge graph, which is an enhanced Wikibase instance, managed by the Europran Commission. It’s internal model is aligned with LDSO. The LDSO is not officially published yet, but it is fully described on a page of the user guide.
If users are sharing information with your system, do they need to buy a specific tool/application in order to be able to share, or are they operating through a platform?
Currently the public part of the platform is accessible to anyone for free but only allows to retrieve metadata published by our participating institutions through APIs. The administration part of the platform which allow users to create and publish datasets, dataservices and concepts is running a "private beta" phase, and we manually create the accounts for the different institutions that are onboarded on the platform. The usage of the platform is currently free for those users, and they can publish their metadata at no cost for them.
As part of the lessons learnt, the development of an integrated software tool is mentioned; based on you study, could existing semantic technologies or tools not be used or reused?
In the framework of our research we have used existing tools, for example OpenRefine or Virtuoso Universal Server. Additionally, we have used some tools that the European Commission has developed, for example the CPSV-AP validator, etc. Additionally, in order to publish public service descriptions as linked data we had also to define a specific URI design policy for public service modelling, extract public service descriptions from an html public service catalogue, etc. We suggest that an integrated tool could guide practitioners through this process and also hide the underlined technical complexity through a user-friendly graphical user interface. We anticipate that the proposed integrated tool could support incorporation or interaction with existing tools, however we have not experimented in the framework of this research with the development of the proposed integrated tool. Actually, it could be considered as part of the proposed future work.
What is the main advantage that AKN4EU is bringing to the drafting of consolidations?
The Akoma Ntoso schema has been customised to the particularities of the EU legislation, resulting in the AKN4EU specifications. To cover the needs of consolidation we have further refined and enriched the AKN4EU with specific metadata and structures arriving to the AKN4EU-CONSLEG specifications which have been the basis of our project.
Compared to the current FORMEX format (more details available at here), AKN4EU-CONSLEG offers several significant improvements:
- Versatility in document versions: AKN4EU-CONSLEG allows for dynamic generation of single-version documents based on any date and type of date using XSLT technology.
- Enhanced metadata: AKN4EU-CONSLEG provides comprehensive metadata for each modification, including references to the exact location within the amending act, as well as the application date for each modification.
- Consistent format: AKN4EU employs a uniform format for both legislative drafting and consolidation. With AKN4EU-CONSLEG we can streamline the consolidation process, facilitate automation and enhance the efficiency while producing consolidated texts.
- Inclusion of editorial content: AKN4EU-CONSLEG enables the aggregation of preambles from the basic and amending acts, the use of annotations and references to legal provisions that are affected but not modified (e.g. annulment by the Court of Justice of the EU).
What efforts are needed to come from plain text to structured data & to finally reach automation of the legal analysis?
From the beginning of the project, we were aware that our legal analysis methodology contained relations and dependencies between different parts that were just described in the plain text, but not correctly structured. We also recognised the dependencies between the methodology and Common Data Model or translation lists (NAL). So very important phase was to discover how the legal analysis can be correctly structured and how such structure can be formalised, also in view of its future use (as an extension to CDM and possible benefits in regard the automation of the legal analysis process).
Afterwards, we must transform such logical model into LAM structured data by using correct technical means. Especially in this phase, support and active cooperation with the experts in the field ontology was crucial for the project. Regarding the automation of the legal analysis process, there must be still some effort done, because we need to establish an environment that would be able to take advantages of formalised LAM, CDM and proper mark-up.
While RDF and XML do not play in the same world they are seen as competitors by classic developers. How do we depart from this?
This comes back to my message in slide 9: RDF should not be seen as a competitor to any data format, but as an facilitator of interoperability across heterogeneous formats. If XML data works for you internally, stick to XML -- but when you need to connect your XML data to other data, looking at it through RDF glasses (e.g. using GRDDL) becomes useful.
Is the Exchange of data between WhatsApp and other open apps in scope of solid?
The import of data from proprietary app to one's Solid pod is definitely in scope, although I am not aware of active projects in this field (see old discussions here). Exchanges the other way around (getting WhatsApp to use data from my pod) is much trickier, as it would require WhatsApp to implement a Solid client in their own application).
Too many standards and too expensive implementations, is often heard. Does that refrain data owners to move towards the use of semantics and the related technologies?
I can't deny the fact that there are many standards, even when considering a single standard body like W3C. We are continuously evolving our process to make it clearer which standards are superseded by others, or allowing a single standard to be incrementally improved rather than creating a new one.
About "too expensive implementations", I don't think that W3C standards suffer from this too much, since we do not publish a standard unless there are (at least) two independent implementations of it. This is a way to ensure that
- the standard can reasonably be implemented,
- the standard is of enough relevance to some people that they spend some time implementing it, and
- when the standard is published, people have already some implementations available to use it.
Is NOGA linked to NACE or ISIC?
The General Classification of Economic Activities (NOGA) is the Swiss version of the European classification of economic activities NACE. It is published on the I14Y Interoperability platform at this address.
Question to Stephen Abbott Pugh: Beneficial ownership is just a facet of possible ways to avoid taxation and money laundering. How do you see the connection with the other facets?
Globalised financial flows which allow money to constantly shift around the world are here to stay and are likely to only increase in speed and frequency. Company formation is also increasing in speed in many jurisdictions where electronic filing is becoming the norm. Those carrying out due diligence or know your customer (KYC)/know your business (KYB) processes are being asked to keep up with this.
Protocols and standards to capture and share beneficial ownership data about the real people behind corporate vehicles have not kept up with the pace of innovation and efforts around the payments or entity data space. Any efforts such as country-by-country reporting, automatic exchange of information, a move towards "perpetual KYC or pKYC" checks can only benefit from having more accurate, adequate and up-to-date beneficial ownership information to keep up with the pace.
Question to Gernot Friedrich: Currently semantic data does suffer in the quality area. Any initiatives or standards on quality that the militaries do work on?
Military decision-making, is "the cognitive process leading to the selection of a course of action among alternatives"; considering that in the fog of war information is often incomplete each decision is a risk-taking judgment. In the world of Command and Control, Commanders are using more and more data analytics, to create information advantage to deliver decision dominance and to decrease the cumulative risk accepted when making a decision. I don’t think that a single “data quality standard” will solve the issue of data quality. We need to address each dimension of data quality - Completeness, Consistency, Conformity, Accuracy, Integrity and Timeliness - with a tailor made strategy. A typical causes of data quality issues is in source data that is housed in a patchwork of different national systems. Each of these data sources can have scattered or misplaced values, outdated and duplicate records, and inconsistent (or undefined) data standards and formats. In NATO we have more than 35 different data standards which are developed by Communities of Interest, many of these data standards address same concepts or types of real world objects and events but use their own data definitions, data types and formats.
With our Data Standards Harmonization initiative, we try to address the dimensions of conformity and accuracy at the root. For example our NATO Core Data Framework Standard recommends the use of a common Semantic Reference Model (Core Ontology) for those concepts that are common across most of our Communities of Interest (e.g. Actors, Locations, Events, Tasks, Information Resource, Feature, Facility, Materiel, …) we are working closely with our NATO Terminology Office to ensure that we use agreed definitions. Each COI may have their own Ontology, but we need to harmonize or at least map those concepts that are also used by other COIs. Then we need to identify those Data Elements within our existing standards that are required to fuse datasets that are shared by COI using their specific data exchange specifications, so that we can update the specifications in the next version.
Question to Gernot Friedrich: Gernot mentioned this wonderful principle "we need to work together". How to motivate the players in this direction when you are discussing technical matters?
Collective defence means that an attack against one Ally is considered as an attack against all Allies. The principle of collective defence is enshrined in Article 5 of the Washington Treaty. “The ability to act together coherently, effectively and efficiently to achieve Allied objectives” is how we define interoperability within NATO. This sums what we do as an Alliance and is a cornerstone of our standardization work. Through our Federated Mission Networking initiative, we are on a path toward ‘day zero’ interoperability. This means that, Allies and partners, will ‘hard-wire’ interoperability into their capabilities ready for the very first day of a NATO mission. Within FMN we ensure full traceability from the operational (or business) requirements to the technical interoperability specifications. So reminding people on the operational interoperability problem that we are trying to solve often helps to remind our technical specialist to focus on solving a technical issues.
How can other institutions benefit from the services offered by the Publications Office for collaboration? Under what conditions?
The reference data management service offer provided by the Publications Office to the European Commission falls into the scope of the Corporate Reference Data Management in the European Commission policy. Nevertheless, similar requests coming from EU institutions will be considered too and processed in a timely manner, given priority according to the level of urgency of each request and to resource availability. Furthermore, a knowledge base is available on the EU Vocabularies website, while tutorials on VocBench and ShowVoc are about to be released by the Publications Office.
You mentioned using a vocabulary for assessing statistical quality. Could you explain how it is used?
ESS Quality Glossary is a comprehensive collection of the terms and concepts related to the main activities of the ESS (European Statistical System) in the field of statistical quality, such as the Peer Reviews. It also addresses conceptual and methodological work within the ESS, taking stock of current research and scientific publications on quality in official statistics. The ESS Quality Glossary is maintained by Eurostat in collaboration with the Working Group on Quality in Statistics (more information can be found here).