The IMMC protocol
Have you ever wondered about how the Publication Office (OP) is able to serve all 76 EU institutions, bodies and agencies – allowing them to exchange multitude of documents amongst themselves in a lightweight and flexible way? Or have you ever wondered about how the OP is able to publish a wide range of documents in an orderly and consistent manner, to keep them updated and to allow users to quickly retrieve the specific document they are looking for?
Keeping track of metadata is a big part of the answer. And the other part of the answer lies in having a standardised way of receiving documents, which allows information technology systems (IS) to sustain data flows in an efficient and scalable way, whilst ensuring that sufficient quality levels are met. Together, these two needs are fulfilled by the IMMC protocol.
Borne out of the Interinstitutional Metadata Maintenance Committee (IMMC) in 2010, the IMMC protocol was the result of an agreement between EU services on a standardised way of exchanging documents which supported the interinstitutional decision-making process. Fast-forward to today: over 1.5 million IMMC messages per year are exchanged between the institutions and bodies, the OP and its contractors and stakeholders at the national level. Of these standardised and agreed information flows (contracts), 150 are mainly bilateral flows between the OP and partners, and 90 IMMC exchange flows are used between the European Commission and its partners.
The IMMC protocol supports several business purposes from granting “public access” to documents of the European Parliament, the Council of Ministers and the European Commission, to keeping track of (ordinary and special) legislative documents and case-law documents. Currently, the standard:
- supports interinstitutional flows, supporting the legal obligations set in the treaties alongside initiatives such as the Better Regulation Portal thereby facilitating decision-making,
- Supports flows to the OP and internal flows in the OP (archiving, general publications...).
Broadly speaking to date there are two IMMC schema-lines in use:
- IMMC v2 mainly supports interinstitutional legislative procedures (e.g. allowing for documents and event information to be consistently transferred among the EU bodies). From 2021, this schema-line also supports the Commissions’ exchanges with the OP with regards to pre-legislative information (e.g. Staff Working Documents or Delegated and Implemented acts subject to publication for feedback on the Have Your Say Portal),
- IMMC v3 mainly supports jurisprudence document production (i.e. transfers of metadata for caselaw) and internal procedures to the OP (General Publication, Consolidated Legislations, Summaries among others).
Contracts, drawn out for a given business purpose, sender and receiver, ensure that Information Systems (IS) can exchange information easily. In IMMC, these contracts are documented through IMMC transmission protocols and technically enforced through XSD schemas. The business purpose of the transmission determines the schema-line (IMMC v2 or IMMC v3), and each schema-line has a communication protocol.
Although the schema lines
- rely on the IMMC vocabulary,
- follow the same design practices (e.g. same folder structure) and
- are designed for long-term stability and usage thus (follow the same well-defined process with regards to defining, maintaining and publishing schemas),
they are incompatible (i.e. an IMMC v2 descriptor cannot be validated by an IMMC v3 schema).
However, a sender can compose a descriptor with a deprecated release of the implementing contract (e.g. release 123 of a domain-specific XSD in IMMC v2) and validate it with newer releases of the XSD (e.g. Release 200 of the domain-specific XSD in IMMC v2). This is because the OP ensures that all new releases of schemas are backward compatible with prior descriptors (i.e. XML instances composed with items of previous releases of the domain specific XSD can still be validated by future releases of the domain specific XSD). This feature allows exchange partners (which have not requested changes) to keep working the way they did, with the schemas they downloaded at the time, and does not force them to change to the latest version of the IMMC schema. To monitor the backward-compatibility criteria, the OP also provides descriptors composed by previous releases in the sample IMMC descriptors released for testing purposes.
Sending metadata (data about the data) alongside the transmitted documents allows systems and people to retrieve data with ease. Coupling this with a standardised way of exchanging data, facilitates the exchange and long-term storage of metadata and contributes to intuitive categorisations of data.
Looking into the protocol, we note that in 2011, the Management Committee agreed on a minimum set of descriptive metadata and values from named authority lists (NAL) to be included in interinstitutional transmissions. Having a standardised way of exchanging information and describing disseminated data, this protocol:
- facilitates communication between institutions, bodies, agencies and other agents at large
- contributes to the EU’s principle of transparency and openness of the EU institutions
- contributes to EU priorities related to streamlining the decision-making process and fostering to open, accessible and reusable data.
The IMMC protocol hierarchically specifies three categories of data in XML files: transmission metadata, core metadata, and extensions. This division allows for a flexible, scalable and maintainable approach with regards to data exchange.
- General-use elements related to administrative data are specified in the core metadata transmission file (e.g. date-of-publication, processing instructions....). In the past, the publication-request file (currently deprecated) could be used for similar purposes. Additional domain-specific administrative metadata are stored in the domain-specific transmission extension.
- Descriptive metadata elements which are common to all exchange domains are specified in the core metadata. Borne out of the minimal set of information to be exchanged to support legal decision-making processes, this file has expanded to contain mandatory and optional elements.
- Descriptive metadata which expresses additional information to the core metadata, and necessary for given exchanges is specified in extension metadata files Extension metadata depends on the business needs of the sender, thus each sender-receiver pair can have an additional domain-specific extension file per business domain. In line with software-engineering best practices metadata used by multiple business domains is stored in the common metadata extension (cmext), while metadata specific to one business domain is stored in the domain-specific extension file.
- The receiver has better visibility of the flows and can keep track of the evolution of exchanges and the messages. Indeed, as the sender must include a minimal set of information to identify the document, the receiver can use this information to link new messages to previous ones.
- The flows which IMMC can sustain are extensible, scalable and durable. Indeed, as the sender composes each message by merely including the relevant information to that specific transmission, the protocol is lightweight. By customising (e.g. enriching) the information sent through extensions, the protocol is both flexible and scalable.
The metadata to be exchanged between each partner is agreed on by the concerned stakeholders, documented in the transmission protocol document, implemented in the XML schemas and enforced by the domain-specific XML schema. The IMMC descriptor (XML file) sent alongside the documents provides additional information as to the categorisation of documents and the processing of the files.
On the sender’s side, the sender submits a message (e.g. a zip file which packages pdf documents among other files alongside an XML descriptor which specifies the metadata in the documents). The metadata ranges from descriptive data (i.e. data relevant to the purpose of the document, e.g. keywords, important dates, places...) to administrative data (i.e. data related to processing instructions). The message, sent through the Internet to the receiver, lands in the receiver’s IT system, and can then pass through different phases of validation.
If the receiver configures its information systems (IS) in a way that results in quality assurance and increased capacity and/or speed of processing, the standardisation of messages also leads to additional advantages, allowing institutions to stay on par with the evolving landscape of Information Technology.
We note that OP’s receiving information system (CERES):
- implements automatic validation rules on the structure and the content of the message. More specifically, CERES validates the message’s structure against the implementation of the transmission protocol (the domain-specific transmission XSD) and cross-checks the technical data in the IMMC descriptor against the accompanying files (e.g. checksums for PDFs ...).
- processes messages automatically, redirecting them to the correct destinations.
As automated validation (of the structure and partially of the contents) can occur in the matter of a few seconds, the standardisation of these messages increases the capacity at which the OP can process messages efficiently (scalability of the IT systems), allowing the OP to support more institutions and more flows (as OP performs some validation early on). Along with simplifying the existing processes and enabling more scalable tools, standardisation also allows for a better visibility of the flows, thus leading to better-quality assurance and quality-control practices.
As the sender and the receiver have already agreed on the structure of the descriptors to be transmitted (agreeing on the XSD as per the IMMC transmission contracts), a properly equipped IT system can automatically validate the XML structure. To this aim, institutions, agencies and bodies which configure their receiving IS to validate automatically benefit from additional advantages.
Looking into the OP’s modus-operandi with regards to validating metadata, CERES:
- checks that the message sent is well-formed (e.g. correct naming convention...)
- parses through the descriptor (XML document) and validates the structure against the agreed domain specific XSD document.
If all these checks are passed successfully, the IS performs further automatic validation checks on the metadata (e.g. on the contents of the IMMC descriptor) and on the contents of the documents referred to in the XML descriptor. It then may perform other actions such as re-directing the message to the right flow (e.g. storage or publication flow) or sending acknowledgement (ACK) responses to the receiver. For example, if the documents are to be published, the IT system will forward the message to the designated proof-reading team (which will then proceed with the manual validation of the documents).
IMMC is a point-to point communication: messages are sent from one sender to one receiver. Deciding the purpose of the message (thus selecting the business process the message supports) defines the exchange domain of the communication, explicitly defining the structure and vocabulary to be used in the message (implemented in XSDs) and implicitly defining the IMMC schema-line (e.g. IMMC v2 or IMMC v3). It is important to note that the vocabulary shows the whole range of what can be expressed, and not necessarily what is expressed in the transmitted XML document (which is usually a subset of this). For a given purpose each sender and receiver agree on a transmission protocol contract: a document which specifies the metadata of interest to be included (compulsory or optional) in the IMMC descriptor in terms of structure and vocabulary. To draw up this contract, the sender discusses a-priori with the receiver, agreeing on the range of items of interest (selected from the vocabulary) to be allowed in the transmission. These contracts can only be done with the approval of the OP and are technically implemented through the means of a domain-specific transmission XSD. Each institution thus knows which type of IMMC message they will receive, from whom they will receive it and how to handle it.
Moving on from the transmission protocol the sender can create descriptors (optionally with the help of the IMMC Template tool). Descriptors are XML files which contain the metadata relevant to the processing and the categorisation of the documents and should respect the transmission protocol. To create an IMMC package, the sender bundles the IMMC descriptor (1 per package) with (optionally) the documents to be sent to the receiver and the. An example of a message composed for a given exchange protocol is described below.
An IMMC descriptor (XML) contains information related to the transmission (i.e. administrative metadata) and information related to the content of the message (i.e. descriptive metadata). This metadata is specified in supporting IMMC files in a way which supports maintainability and scalability of flows. Any descriptor must include the compulsory items of the core-metadata (cm) file (general descriptive metadata) and the compulsory items of the core-metadata transmission (cmt) file (general administrative metadata). A descriptor can then be enriched by including elements from extensions: the common-metadata extensions, the domain-specific metadata extensions or the domain-specific transmission file.
A sample is a valid example of an IMMC descriptor and groups metadata elements from the supporting files in accordance with the specific contract. With regards to its structure, the XML file of a sample starts from the domain-specific transmission elements (e.g. cortrans:cor_transmission_request), and gradually incorporates lower levels (e.g. cmt:transmission, then cm:work elements, incorporating domain-specific extension elements and common extension elements where needed). In other words, domain-specific transmission levels contain core-metadata transmission levels alongside other nested elements (e.g. from common-metadata extensions or domain-specific extensions). Generally the following rules apply:
- Transmitting documents in the context of general publications implies having metadata elements from the transmission levels and the document levels (work, expression, manifestation). The minimum set of metadata is specified in the OP Core Metadata.
- Transmitting documents in the context of legislative procedures implies having metadata elements from the transmission levels, the procedure and event levels, and the subsequent document levels. The minimum set of metadata (e.g. updates to metadata) is specified in the IMMC agreement on legal data. (In short, a procedure contains at least one event, which in turn can contain works, expressions and manifestations)
- Transmitting metadata-only may imply having metadata elements from the core-metadata transmission levels and the core-metadata from the procedure and event level.
Regarding the structuring of the metadata files, the compulsory items in the core-metadata (minimal metadata to be sent, as agreed by all institutions) are systematically exchanged for all interinstitutional exchanges, while additional metadata is made compulsory in specific contexts.
In compliance with information technology (IT) best-practices, the metadata (additional to the core-metadata) exclusively used for one purpose is kept locally (e.g. on domain-specific extensions), while metadata (additional to the core-metadata) and the information common to more than one business domain is stored in the common-extensions file. The core metadata and common extensions pertain to metadata Level 1, and any changes performed at this level must be agreed on by the different stakeholders in the IMFC Metadata subgroup meeting. Domain specific extensions instead are categorised under Level 2, and uniquely impact a given sender-receiver couple. Changes at this level can be agreed upon in bilateral exchanges.
Structuring metadata in this way ensures the long-term stability and usability of the IMMC schemas. Indeed, these practices
- minimise maintenance efforts by having a clear structure and minimising duplication (i.e. changes to the common metadata can happen in one place)
- separate concerns (allowing for more flexible updates to occur with regards to metadata which is used by less institutions)
- ensure that each message contain sufficient metadata to understand the context of the exchange (e.g. core-metadata).
Delving into the pre-IMMC world, we notice that contractors relied on sending pre-formatted OJEEP messages to the OP to transfer metadata and files in the context of OJ publications, while institutions sent documents via email (with no or widely varying quality levels with regards to metadata).
With no standardised way of exchanging information across institutions, institutions relied on their own set of tools to disseminate information. As the tools were characterised by a diverse set of proprietary metadata for documents and events, programming/configuring IS to process this information would prove impossible, meaning that machines would not be able to efficiently talk to each other (being interoperable) and leading information to be validated manually. Moreover, very few institutions had formalised specific contracts with regards to data exchanges, meaning that receivers would not know a-priori what metadata they would receive (if any).
For the receiver, this way of dealing with document flows was extremely time-consuming as they had to sift through the various documents and manually verify documents and the widely differing formats of metadata (checking for coherency and completeness).
Although tailoring OJEEP to the institutions could have been a solution, this activity would prove arduous and resource intensive. Indeed, although OJEEP messages were structured to a certain extent, they only catered to flows with regards to contractors and the OP and had not been built for flexibility or scalability.
And thus the quest to find another way of exchanging standardised information across (multiple) bodies, began. Searching for a way to reduce uncertainty, a-priori prescribed contracts would allow each user to send and receive messages in a pre-formatted manner and would allow the receiver to know what they would receive from the sender. The receiver (together with OP) would subsequently create a schema (the implementation of the prescribed contract) to allow information systems to process messages (and could also configure IS as to perform some automatic validation).
In 2011, the IMMC agreement set out the minimum requirements of the set of metadata to be sent along with the disseminated content in the legal domain (e.g. legislative and non-legislative procedures). Manuscripts (to be published in the OJ) and legal documents (to be published in EUR-LEX) would therefore have to be sent alongside the minimum set of metadata (metadata on procedures, events and their associated works, expressions and manifestations). Through the years the protocol expanded to cater for a range of activities (from the dissemination of judicial procedures to digitisation and archiving activities). Look into the timeline for further details: