Dataset publication guidelines
To ensure the proper ingestion of your datasets and facilitate its dissemination on the EU Vocabularies website we advise you to comply with a set of basic rules as follows:
Packaging format and communication
The content of the future publication will be delivered as a zip archive.
The delivery has to take place in accordance with the scheduled “code freeze” date.
Any change of date has to be communicated at least 2 weeks in advance of “code freeze”.
Unless defined otherwise, the package will be sent to the following email address:
Content of the publication package
A package will not be accepted for publication unless the following components are included:
- The actual dataset files will always be located in the root folder of the archive
- Depending on the type, the files will be in one of the following formats
- Semantic vocabularies: RDF, TTL, XML, JSON-LD
- Generic vocabularies: CSV, GC, XML, SVG
- Models: OWL, XML Schema, DTD, XML, TTL
- Alignments: RDF, TTL, XML
- Every dataset type intended for publication will be accompanied by at least a documentation file and a release note
- All documentation files associated with the dataset will be stored in the Documentation folder
- The Documentation folder will be located in the root folder of the main package
- The documentation will be provided only in HTML or PDF format
- Any documentation file will clearly state in the beginning the dataset name and the title of the document (first page or first screen to be displayed)
- If only on documentation file is provided, this file will contain at least the following sections:
- Title of the document
- Title of the dataset
- The scope and intended target of the document
- A basic description of the dataset
A main section presenting the dataset at large, as well as its intended use, should be included. Such a description might give details about the structure, usage principles, data models, associated statistics, etc.
- The Release notes will be stored in the Release folder that is located in the root folder of the main package
- The release notes will be delivered as a HTML, PDF or TXT file.
- The Release note will contain at the minimum : the version ID, a list of distribution formats included in the release, contact details of the copyright owner and if possible a list of new elements that the release is providing
Optionally, and if relevant for the scope of the dataset, a publication package might contain as well:
- Sample files – Packed together as a zip file with the name Samples. Stored in the root folder of the main package
- Diff files – Stored as independent files under the folder Diff that is located in the root folder of the main package
Depending on the type of dataset, some elements of the package might differ.
Any such deviation has to be clarified in advance with the publication team (OP-EU-VOCABULARIES@publications.europa.eu)
File naming and conventions
In order to ensure clarity in communicating the scope of each file to the intended users it is advisable to use a proper naming convention for the various files stored in the publication package.
Our preferred file naming structure follows the rules bellow:
DA – [Required] Dataset name or acronym (e.g. EuroVoc, IMMC, ECLAS, etc.)
FC – [Required] File content, intent or distribution (e.g. Alignment, Example, User_manual, Release_note, Diff, SKOS, MARC, etc)
VS – [Optional] Version ID or date of the dataset|
EXT – File extension (e.g., RDF, TTL, XML, PDF, CSV, etc.)
File name = DA_FC_VS.EXT
No spaces are accepted in the file names of the package or the files included in the publication package.
In case of non compliance
If an already existing convention (for content, labels, etc.) was defined and/or used for previously published packages, please inform the publication team (OP-EU-VOCABULARIES@publications.europa.eu) to identify the best approach to be followed.