DataCite Best Practice Guide
The DataCite Metadata Schema [external link] has become a de facto standard for describing research data. Despite all of its efforts to standardize metadata allocation, this schema offers a great deal of leeway and alternatives in detail. For example, it is optional to clearly identify languages used in the metadata either via ISO 639-1, ISO 639-2, ISO 639-2/B (language codes commonly used by libraries) or ISO 639-3 language tags. The value of metadata increases with its consistency, which is achieved through compliance with standards. One of the aims of the Best Practice Guide presented here is therefore to limit the choices provided by DataCite by specifying a preference, and in this way to ensure as much consistency as possible.
This document is a guideline for the use of the official DataCite Metadata Schema documentation [external link], version 4.6 [external link]. A more convenient support documentation with better navigation can be found here as a HTML version DataCite Metadata Schema Documentation [external link]. It is meant for researchers, IT and library support staff. Further information on the schema can be found on the DataCite support site [external link].
To create a DataCite XML file, we recommend to you to use the DataCite Metadata Generator [external link]. This tool is kept in sync with this guideline, safe for transmission times inbetween versions. If you want to create metadata for research data on a scale that is too large for manual procedures, please contact one of the institutions named above.
Overview
The first part, General Best Practice, is a selection of recommendations and obligations when using DataCite in general and was written in an FAQ-style (Frequently Asked Questions).
The second part, Best Practice for specific fields, gives more details for each of the 20 metadata fields of the DataCite metadata standard.
The third part, Examples, is a compilation of DataCite examples.
A. General Best Practice
B. Best Practice for specific fields
- 1 identifier [m]
- 2 creator [m]
- 3 title [m]
- 4 publisher [m]
- 5 publicationYear [m]
- 6 subject [m]*
- 7 contributor [r]
- 8 date [r]
- 9 language [o]
- 10 resourceType [m]
- 11 alternateIdentifier [o]
- 12 relatedIdentifier [r]
- 13 size [r]*
- 14 format [o]
- 15 version [o]
- 16 rights [m]*
- 17 description [m]*
- 18 geoLocation [r]
- 19 fundingReference [o]
- 20 relatedItem [o]
Mandatory fields are indicated by the tag [m], recommended fields by [r] and optional fields by [o]. Note: This guide deviates from the DataCite Metadata Schema 4.6 [external link] in the assessment of recommend and optional properties and assigns different levels of obligation to some of them. They are indicated by an * in the list above.
These fields improve discovery, make long-term management of the datasets easier for the hosting institution and are helpful for future (re-)users of the dataset. The benefits of providing additional information outweigh the effort, as most of the information is already available to researchers like providing a short abstract in the description.
C. Examples
- Digital encyclopedia: “Bayerisches Musiker-Lexikon Online”
- Meteorological project: “ClimEx”
- Volume as part of a series: “Discourses on Corruption”
- Article in a conference proceeding: “High-Energy Physics”
- Critical editon (digital & print): “Richard Strauss Kritische Werkausgabe”
- Digital lexicographical information system: “VerbaAlpina”
A. General Best Practice
What do the metadata describe?
Unless otherwise specified all information in the metadata concerns the research data (also denoted as “resource”), neither the project in whose context the data have been created or collected nor the metadata themselves.
What is the language of the metadata?
- The default language of the metadata is English. If another language is used, the same information must additionally be specified in English.
- Where language variations are possible (e.g. title, description, affiliations), the language should be specified by xml:lang attributes:
title xml:lang="de">
<
Bayerisches Musiker-Lexikon Online (BMLO)title>
</title xml:lang="en" titleType="TranslatedTitle">
<
Digital Encyclopedia of Bavarian Musicianstitle> </
- Proper nouns do not need to be translated.
- Use standardized data (e.g. controlled vocabularies) whenever possible. This might allow data aggregators to display the information in the language most suitable to the use case at hand.
- Recommendation: use either the two-letter language codes from ISO 639-1 or the three-letter language codes from ISO 639-2 (listed on Wikipedia [external link]). Be advised: the three-letter codes are used in library systems. If you use a different standard (i.e. BCP 47 [external link]), pay attention to be consistent and do not alternate between standards. In any case, DataCite expects a language code confirming to this regular expression pattern
[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
, which works for most common language codes.
How should I specify a person?
- A person should be identified by name, persistent ID and affiliation.
- It is recommended to not use titles/academic degrees in names as they are subject to change.
- State the name in the order “family name, given name”. For example:
creatorName nameType="Personal">Krefeld, Thomas</creatorName> <
- Recommendation: Additionally, separate family name and given name, each in a specific subfield:
givenName>Thomas</givenName>
<familyName>Krefeld</familyName> <
- Add a persistent identifier for persons, preferably a GND-ID (Gemeinsame Normdatei) [external link] or an ORCID-ID [external link] (Open Researcher and Contributor ID). This will make attributions robust to changes of names or affiliations:
- Recommendation for GND-ID entries:
- GND-ID entries can be conveniently searched for on WebGND [external link] or lobid-gnd [external link]; for further search options see the GND website [external link].
- Only use individualized GND entries that clearly identify a person (usually by year of birth and/or profession).
- Recommendation for GND-ID entries:
<!-- GND entry -->
nameIdentifier
< schemeURI="https://d-nb.info/gnd/"
nameIdentifierScheme="GND">
123778689nameIdentifier>
</
<!-- ORCID entry -->
nameIdentifier
< schemeURI="http://orcid.org/"
nameIdentifierScheme="ORCID">
0000-0001-9657-6052nameIdentifier> </
- It is also recommended to indicate the affiliation to an institution (Note: An affiliation is an institution, not a project).
- See: How should I specify an institution
- If a person has multiple affiliations:
- It is recommended to state only one institution (the context of the resource determines the affiliation).
- If unavoidable, multiple affiliations can be specified in the order of importance for the dataset published.
How should I specify an institution?
- Institutions may mainly be enterend in affiliaton or fundingReference.
- Follow the policy of the institution.
- State the name of the institution as specific as possible (e.g. start with the chair/group, not with the university). If the name of the institution has changed use the name as it was at the time of creation of the resource.
- Start with the more specific organizational units first and end with the most generic unit, separated by semicolon:
affiliation xml:lang="de">
<
Institut für Romanische Philologie;
Ludwig-Maximilians-Universität München affiliation>
</
affiliation
< xml:lang="de"
affiliationIdentifier="https://ror.org/05591te55"
affiliationIdentifierScheme="ROR">
Ludwig-Maximilians-Universität Münchenaffiliation> </
- If there is no policy or multiple names in multiple languages are given, use the English name.
- Always specify the language in which the name is given using a xml:lang tag.
publisher xml:lang="en">Leibniz Supercomputing Centre</publisher>
<publisher xml:lang="de">Leibniz-Rechenzentrum</publisher> <
- Affiliations are to be specified as of the time of creation of the resource.
- Add a persistent identifier (PID) for the institution, preferably a ROR-ID [external link] (Research Organization Registry); if there is no entry in ROR, use a ISNI-ID [external link] (International Standard Name Identifier) or GND-ID.
- For research funding organizations it is recommended to additionally provide the CrossRef Funder Registry ID [external link].
fundingReference>
<funderName>Deutsche Forschungsgemeinschaft (DFG)</funderName>
<funderIdentifier
< funderIdentifierType="Crossref Funder ID">
http://dx.doi.org/10.13039/501100001659funderIdentifier>
</fundingReference> </
How should I handle different versions of the same research data?
Metadata can be updated without releasing a new version of the research data, but not vice versa; if the research data change, you need to update the metadata to reflect these changes.
If you want to publish several versions of the research data, but also want to have a point of reference for all of these publications together, we recommend to use a form of DOI-versioning [external link]:
- Specify a set of metadata that is valid for all versions.
- Specify a set of metadata for each version.
- Update all these metadata with the according references (e.g. include isNewVersionOf in the metadata of the new version, see relatedIdentifier for details).
B. Best practice for specific fields
1 identifier [m]
DataCite documentation [external link]
- This field can be omitted on submission: it is mandatory according to the DataCite standard, but it will be set by the data publisher.
- The assigned Digital Object Identifier (DOI) will be provided to you by the data publisher.
Example
identifier identifierType="DOI">10.5282/ubm/data.158</identifier> <
2 creator [m]
DataCite documentation [external link]
- This field is mandatory.
- Consult sections on how to specify a person and how to specify a institution.
- Always prefer natural persons over institutions.
- You can use the xml:lang attribute to provide the language of the creatorName. This may be helpful, if an institution uses different names in different languages.
Example
creators>
<creator>
<creatorName nameType="Personal">Krefeld, Thomas</creatorName>
<givenName>Thomas</givenName>
<familyName>Krefeld</familyName>
<nameIdentifier
< schemeURI="http://orcid.org/"
nameIdentifierScheme="ORCID">
0000-0001-9657-6052nameIdentifier>
</nameIdentifier
< schemeURI="https://d-nb.info/gnd/"
nameIdentifierScheme="GND">
123778689nameIdentifier>
</affiliation
< xml:lang="de">
Institut für Romanische Philologie,
Ludwig-Maximilians-Universität Münchenaffiliation>
</creator>
</creators> </
3 title [m]
DataCite documentation [external link]
- This field is mandatory.
- Be as specific as you would be in the context of a journal publication.
- It is recommended to avoid filenames (e.g. “survey.csv”) or generic descriptions (e.g. “survey data”) as a title.
- The main title is specified without a titleType.
- The language of every title must be specified (see section on metadata language for the use of xml:lang attribute).
- If the main title is not specified in English, a title of type “TranslatedTitle” must be given in English.
- Title types “AlternativeTitle” and “Subtitle” are supported but not recommended. “Other” must not be used as a title type.
Example
titles>
<title xml:lang="de">
<
Bayerisches Musiker-Lexikon Online (BMLO)title>
</title
< xml:lang="en"
titleType="TranslatedTitle">
Digital Encyclopedia of Bavarian Musicianstitle>
</titles> </
4 publisher [m]
DataCite documentation [external link]
- This field is mandatory.
- The language of the publisher name must be specified (see section on metadata language for the use of xml:lang attribute).
- If possible, the attibutes publisherIdentifier, publisherIdentifierScheme and schemeURI should be used.
- This field can be omitted on submission: it is mandatory according to the DataCite standard, but it will be set by the data publisher, i.e. the institution that hosts the (meta)data.
Example
publisher
< xml:lang="de" publisherIdentifier="https://ror.org/05591te55"
publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">
Universitätsbibliothek der Ludwig-Maximilians-Universität Münchenpublisher> </
5 publicationYear [m]
DataCite documentation [external link]
- This field is mandatory.
- This field can be omitted on submission: it is mandatory according to the DataCite standard, but it will be set by the data publisher.
Example
publicationYear>2019</publicationYear> <
6 subject [m]
DataCite documentation [external link]
- This field is mandatory, in the DataCite standard it is only recommended.
Mandatory subject annotations
- The following subject annotations are mandatory (must occur at least once):
Type of Subject | Standard | Type of standard | Usage hint |
---|---|---|---|
Discipline | DDC | Classification | Use the English term for the discipline and include the three digit DDC notation via the classificationCode attribute (Canonical Source [external link]). |
Keywords | Wikidata QID and GND | Keyword | Wikidata and GND terms are both mandatory, including redundancy (if an appropriate entry does not exist contact the responsible Institution). Use Wikidata-Search [external link] and GND-Search [external link] to find the appropriate identifiers. |
- It is also mandatory to include at least the valueURI or the classificationCode attribute.
- It is recommended to inculde a xml:lang attribute for the subject.
- To improve machine-readablility we recommend using both valueURI and classificationCode.
Example
subjects>
<<!-- discipline specification using Dewey Decimal Classification (DDC) -->
subject
< xml:lang="en"
subjectScheme="DDC"
classificationCode="521">
Celestial mechanicssubject>
</<!-- keywords -->
subject
< xml:lang="en"
subjectScheme="Wikidata QID"
schemeURI="https://www.wikidata.org/wiki/"
valueURI="https://www.wikidata.org/wiki/Q223776"
classificationCode="Q223776">
gravity assist subject>
</subject
< xml:lang="en"
subjectScheme="GND"
schemeURI="https://d-nb.info/gnd/"
valueURI="https://d-nb.info/gnd/1135686874"
classificationCode="1135686874">
Gravity Assistsubject>
</subjects> </
There should be no overlap between the discipline specifier(s) and the keywords.
Geotagging
Specifying the location via subject is mandatory, if applicable to the resource:
- Canonical source for geonames is the GeoNames Service [external link] (a registration for API access is necessary).
- See geoLocation section for a more detailed specification.
Additional subject annotations
- Additional subjects may be added.
- Specify the language of the subject.
- It is recommended to always qualify subjects by URL or scheme name. A good starting point to research existing schemes is BARTOC.org [external link] - Basic Register of Thesauri, Ontologies & Classifications. Unqualified subjects (not controlled by a controlled vocabulary, ontology or any other standard for the subject terms) are often useless for research data aggregators due to ambiguities.
Example
For this example a complete DataCite metadata file is available, see VerbaAlpina.
subjects>
<<!-- mandatory-->
subject
< xml:lang="en"
subjectScheme="DDC"
classificationCode="410">
Linguisticssubject>
</subject
< xml:lang="en"
subjectScheme="DDC"
classificationCode="004">
Data processing computer sciencesubject>
</subject
< xml:lang="de"
subjectScheme="GND"
schemeURI="https://d-nb.info/gnd/"
valueURI="https://d-nb.info/gnd/4740815-7"
classificationCode ="4740815-7">
Chalet subject>
</subject
< xml:lang="en"
subjectScheme="wikidata"
schemeURI="https://www.wikidata.org/wiki/"
valueURI="https://www.wikidata.org/wiki/Q136689"
classificationCode="Q136689">
chaletsubject>
</subject
< xml:lang="fr"
subjectScheme="wikidata"
schemeURI="https://www.wikidata.org/wiki/"
valueURI="https://www.wikidata.org/wiki/Lexeme:L643765"
classificationCode="L643765">
chaletsubject>
</<!-- optional-->
subject
< xml:lang="en"
subjectScheme="Glottocode"
schemeURI="https://glottolog.org/resource/languoid/id/"
valueURI="https://glottolog.org/resource/languoid/id/high1286"
classificationCode="high1286">
High Germansubject>
</subject
< xml:lang="de"
subjectScheme="geonames"
schemeURI="http://www.geonames.org/"
valueURI="http://www.geonames.org/2764958"
classificationCode="2764958">
Hall in Tirolsubject>
</subjects> </
7 contributor [r]
DataCite documentation [external link]
- This field is recommended if the data are published with a free license.
- If the license specified via the rights field restricts the usage in a way that possibly necessitates interaction with the rights holder, a contributor of type “RightsHolder” must be specified. Examples of free licenses are CC-0, CC-BY, or CC-SA; non-free licenses are for example CC-NC or CC-ND.
- Consult the sections on how to specify a person and how to specify a institution.
- If contributors change over versions, the version metadata should only include the actual contributors of the updated version. A metadata set representing all versions of the dataset (including links to the versions) can include all contributors with the dates of participation, see how to handle different versions of the research data.
- Duplicate mentions between creator and contributor are unproblematic.
- If a person has multiple roles, it is recommended to identify the most important role of that person and select only one.
- Be as specific as possible (a “ProjectLeader” is also considered to be a “ProjectMember”, but “ProjectLeader” carries more information). Use generic role descriptions only when nothing else fits.
- If suitable use the xml:lang attribute to indicate the language of the contributorName.
- The following roles are recommended:
Option | Description from DataCite standard (italics) and usage hints |
---|---|
ContactPerson | Person with knowledge of how to access, troubleshoot, or otherwise field issues related to the resource. |
DataCollector | Person/institution responsible for finding, gathering/collecting data under the guidelines of the author(s) or Principal Investigator (PI). |
DataCurator | Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data centre or repository. |
DataManager | Person or organization responsible for digital maintainance of the finished resource, e.g. migration to new hardware, software and security updates for servers, access rights management. |
Distributor | Institution responsible for dissemination of electronic or printed copies of the resource. The distributor is not neccessarily also a hosting institution of a digital resource, e.g., if server hosting is outsourced but the distributor still organizes access to the resource. |
Editor | A person who oversees the details related to the publication format of the resource. |
HostingInstitution | Typically, the organisation allowing the resource to be available on the internet through the provision of its hardware/software/operating support. |
ProjectLeader | Person officially designated as head of project team or sub-project team instrumental in the work necessary to development of the resource. |
ProjectManager | Person officially designated as manager of a project. Project may consist of one or many project teams and sub-teams. |
ProjectMember | Person on the membership list of a designated project/project team. All persons with a contract in the context of the project which produced the resource. |
Researcher | A person involved in analyzing data or the results of an experiment or formal study. May indicate an intern or assistant to one of the authors who helped with research but who was not so “key” as to be listed as an author. |
ResearchGroup | Typically refers to a group of individuals within a lab, department or division that has a specifically defined focus of activity. |
RightsHolder | Person or institution owning or managing property rights, including intellectual property rights over the resource. Mandatory for non-free licenses; person or institution that owns the rights listed in field Rights. |
Sponsor | Organization or person that issued a contract or under the auspices of which a work has been printed, published, developed, etc. |
Supervisor | Designated administrator over one or more groups/teams working to produce a resource, or over one or more steps of a development process. We recommmed using this role for PhD advisors of the creators, who did not particiate as creators or in other roles themselves. |
Translator | A person, organization, or automated system responsible for converting the content of a resource from one language into another, preserving its meaning and intended message. |
WorkPackage- Leader | The Work Package Leader is responsible for ensuring the comprehensive contents, versioning, and availability of the Work Package during the development of the resource. |
Example
contributors>
<contributor contributorType="ProjectLeader">
<contributorName nameType="Personal">Ludwig, Ralf</contributorName>
<givenName>Ralf</givenName>
<familyName>Ludwig</familyName>
<nameIdentifier
< nameIdentifierScheme="ORCID"
schemeURI="http://orcid.org/">
0000-0002-4225-4098nameIdentifier>
</affiliation xml:lang="de">
<
Department für Geographie,
Ludwig-Maximilians-Universität Münchenaffiliation>
</contributor>
</contributor contributorType="RightsHolder">
<contributorName nameType="Personal">
<
Štědronská, MarkétacontributorName>
</givenName>Markéta</givenName>
<familyName>Štědronská</familyName>
<nameIdentifier
< nameIdentifierScheme="GND"
schemeURI="https://d-nb.info/gnd/">
141321350nameIdentifier>
</affiliation xml:lang="de">
<
Institut für Musikwissenschaft, Universität Wienaffiliation>
</contributor>
</contributors> </
8 date [r]
DataCite documentation [external link]
- This field is recommended.
- It is recommended to provide date and time according to the W3C time and data formats [external link]. If the time is specified always include the time zone.
- Time periods can be specified by specifying the start date and the end date separated by a slash (/).
- The following types should be filled-out by the data producer:
- Collected: time range when the resource was arranged (not necessarily identical to the time range when the resource was created).
- Covered: date range that the resource content applies to or covers. (Example: A text corpus of newspaper articles about a historic event will cover a time span (associated with the event). The corpus can be collected over a different time span.)
- Created: first version of a resource; must not be identical with updated.
- Updated: for a more recent version of the resource; must not be identical with created.
- The following types are set by the publisher:
- Submitted: point in time when the data were recieved by the data publisher.
- Accepted: point in time when the data publisher accepts the data for publication.
- Issued: long format of the field publicationYear, point in time when a publisher publishes the data; should be set.
- Available: only use in the context of embargo periods (this is not recommended).
- Withdrawn: point in time when the publisher retracts the data publication.
- For dates describing the period the resource covers use “Other” for dateType and add “coverage” as a description under datesInformation, see example below.
- It is recommended to use the free text attribute dateInformation for disambiguation, if multiple dates with the same type are specified.
- “Copyrighted” as a dateType should not be used.
Example
dates>
<date dateType="Created">2016</date>
<date
< dateType="Other"
dateInformation="coverage">
2050-09-01T00:00:00+01:00/2050-09-30T23:59:59+01:00date>
</dates> </
9 language [o]
DataCite documentation [external link]
- This field is optional.
- The field describes the main language of the resource, not of the metadata.
- Recommendation: use either the two-letter language codes from ISO 639-1 or the three-letter language codes from ISO 639-2 (listed on Wikipedia [external link]). Be advised: the three-letter codes are used in library systems.
Example
language>en</language> <
10 resourceType [m]
DataCite documentation [external link]
- This field is mandatory.
- DateCite allows various resource types.
- There are three goups of resources described by the metadata: Objects and instruments, discursive text, and research data.
Decision tree to pick the right resourceTypeGeneral:
If you describe a physical object (biological sample, fragment of a meteorite) or an instrument (a book scanner, a microscope) use “PhysicalObject” and “Instrument”, respectively. If not, proceed with 2.
Decide if the resource is data or discursive text (e.g. journal article or analytical text). If it is discursive text, choose one of the following:
- Book
- BookChapter
- ConferencePaper
- ConferenceProceeding
- DataPaper
- Dissertation
- Journal
- JournalArticle
- OutputManagementPlan [Note: A data management plan is a special form of output management plan]
- PeerReview
- Preprint
- Report
- Standard
- StudyRegistration
If not: Proceed with 3.
If the data submission contains heterogeneous data, consider publishing it in separate data publications or (less preferred) use “Collection”. If the data are homogeneous, proceed with 4.
If the data are movies, images or sound files use “Audiovisual”, “Image” or “Sound”, respectively. If not, proceed with 5.
If the data are a digital, interactive representations of some real-world phenomena (e.g. trained models in the context of machine learning) use “Model”. If not, proceed with 6.
If the data are descriptions of a workflow (e.g. in the common workflow language), use “Workflow”. If not, proceed with 7.
If the data are an interactive resource like a virtual notebook use “ComputationalNotebook”. If not, proceed with 8.
If the data are source code files (incl. configuration and built artefacts), use “Software”. If not, proceed with 9.
If the data have a fixed structure (e.g. table-like), use “Dataset”. If not, proceed with 10.
If the data are text files, use “Text”. If not, proceed with 11.
Check if one of the following types is applicable:
- Award (Use this one if, for example, the resource is an entry in a Current Research Information System (CRIS) that details a Leibniz Prize awarded to a staff member.)
- Event (For example, for a conference or an award ceremony.)
- InteractiveResource (This type can be used for interactive tutorials in a learning management system or for certain websites.)
- Project (If, for example, a project is funded by DFG the corresponding GEPRIS [external link] entry would be assigned the resourceTypeGeneral Project.)
- Service (For example, if a university IT center offers access to an LLM running on its servers, this would be a “Service”. Note that the LLM code itself would be “Software”)
If not, proceed with 12.
Use “Other”.
Note: Only items with the resourceTypeGeneral “Dataset” will be included in the Google Dataset Search. All other types are currently not supported.
Examples
resourceType resourceTypeGeneral="Dataset">
<
Regional Climate MeasurementsresourceType> </
resourceType resourceTypeGeneral="OutputManagementPlan">
<
Data Management Plan resourceType> </
11 alternateIdentifier [o]
DataCite documentation [external link]
- This field is optional.
- These alternate IDs additionally identify the resource, meaning that it can also be found via these identifiers and distinguished from other resources by this ID.
- The alternateIdentifier can be a persistent, globally unique ID. However, the field may also be used for identifiers, which are only unique and specific in the context of the research project (e.g. local identifiers or workspace identifiers) but not globally. Examples for alternate identifiers are sequence numbers, time stamps or database numbers. Contrary, the global identifier in field identifier must be a DOI.
- The attribute alternateIdentifierType must be used to specifiy the type of the identifer.
Recommendation for alternateIdentifierType:
For common global identifers, just specify the name of the identifier or its acronym. Examples of such identifiers are: ARK, arXiv, bibcode, CSTR, DOI, EAN13, EISSN, ePIC, Handle, IGSN, ISBN, ISSN, ISTC, LISSN, LSID, PMID, PURL, RRID, UPC, URL, URN, and w3id.
For other identifiers we recommend to first give the origin of the ID:
- project-specific identifier: an ID that has meaning inside the project that created the data.
- application-specific identifier: an ID that has meaning in the context of an application that is used to process the data.
- institution-specific identifier: an ID that has meaning in the context of the institution that provides, funded or created the data.
and then add, separated by a slash (/), the name of the identifer, if known. This way, even if the name of the ID is relatively obscure, the broader context of the ID can still be identified.
Example
Each VerbaAlpina dataset is assigned an internal ID [external link] (a project-specific identifier) as well as a persistent LMU-UB ID [external link] (in short lmUB - an institution-specific identifier) by the data repository.
alternateIdentifiers>
<alternateIdentifier
< alternateIdentifierType="institution-specific identifier/lmUB">
68fd5294-9077-3983-a20e-7f25c074c4c7alternateIdentifier>
</alternateIdentifier
< alternateIdentifierType="project-specific identifier/VA-ID">
L91_v8alternateIdentifier>
</alternateIdentifiers> </
13 size [r]
DataCite documentation [external link]
- This field is recommended, whereas it is optional in the DataCite standard.
- This field is repeatable. Thus, different measures for the size / volume of the dataset can be given.
- If you make use of this field, always specify the size in Bytes (denoted by ‘B’ - note that a lower case ‘b’ stands for bit). Prefered are: kB, MB, GB, TB etc. Separate number and unit with one space. The decimal separator must be the decimal point, e.g., 7.23 GB.
- If the data are compressed, specify the size of the compressed file/archive.
- If the data consist of several units (without using an archival software), specify their combined sizes.
- Further information on the data size (e.g. runtime of an audio file or number of images) can be given in a separate size field as free text. Note that such information can also be given in the description.
Example
sizes>
<size>7.23 GB</size>
<size>34 min (length audio file)</size>
<sizes> </
14 format [o]
DataCite documentation [external link]
- This field is optional.
- Use MIME type format as specified in RFC 2646 [external link], possible values should be taken from the IANA list of Media Types [external link].
Specify in this order (skip if it does not apply):
- If files are compressed, append the MIME type of the compressed file to the MIME type of the uncompressed file using a “+” sign (e.g. text/xml+zip).
- If files are in an archive, specify the MIME type of the archive format, for example “application/tar”. This information is useful to determine in advance which software tools are needed to access the archived files.
- Specify each MIME type in a separate field in alphabetical order, do not repeat MIME types.
Example
formats>
<format>application/tar+gzip</format>
<format>application/netcdf</format>
<format>text/plain</format>
<format>text/csv</format>
<formats> </
15 version [o]
DataCite documentation [external link]
- This field is optional.
- Note that this field refers to the version of the resource, not the version of the metadata.
- The versioning information is set according to the policies of the data provider (data publishers do not change/use this field).
- It is recommended to use semantic versioning [external link]. Up to three labels are supported (Major, Minor, Patch). Depending on the resource, only one or two labels might be needed.
Example 1 (two labels - Major.Minor)
version>4.2</version> <
Example 2 (three labels - Major.Minor.Patch)
version>4.2.1</version> <
16 rights [m]
DataCite documentation [external link]
- This field is mandatory, whereas the DataCite standard specifies it as optional.
- If applicable, rightsURI must be set.
- To avoid inconsistencies only assign a single license to the described dataset or the described software code.
- It is not recommended to publish both research data and software code as part of a single publication (consider two separate publications, see resourceType).
Guidance for using a license:
- Recommendation: Creative Commons (CC) [external link] as license for data and Apache 2.0 license [external link] for software.
- Use the standardized short identifier list provided by SPDX [external link] to specify the license in the rightsIdentifier attribute.
- You should not use CC licenses with the NC or ND limitation to ensure reusability (although submissions with these limitations are accepted).
- For further license options consult Choose a license [external link] or the CC license helper [external link].
Example 1 (Dataset)
rightsList>
<rights
< xml:lang="en-US"
schemeURI="https://spdx.org/licenses/"
rightsIdentifierScheme="SPDX"
rightsIdentifier="CC0-1.0"
rightsURI="http://creativecommons.org/publicdomain/zero/1.0/">
CC0 1.0rights>
</rightsList> </
Example 2 (Software)
rightsList>
<rights
< xml:lang="en-US"
schemeURI="https://spdx.org/licenses/"
rightsIdentifierScheme="SPDX"
rightsIdentifier="Apache-2.0"
rightsURI="https://www.apache.org/licenses/LICENSE-2.0">
Apache License 2.0rights>
</rightsList> </
17 description [m]
DataCite documentation [external link]
- This field is mandatory, whereas the DataCite standard only recommends it: There has to be at least one entry of type “Abstract” in English.
- Always specify the used language (xml:lang attribute) of each description.
- If there are descriptions in more than one language, the content may be different (no literal translation required).
- Each description has a limit of 300 words.
- Description of descriptionType “Methods” is optional. Best practice: use keywords from this controlled list, separated by comma.
- Description of descriptionType “TechnicalInfo” is optional. Best practice: use keywords from this controlled list, separated by comma. Additionally, data producers could consider creating a README file and link it via the relatedIdentifier field.
- These types are not recommended:
- SeriesInformation (If needed, information on series title, volume, issue, or page number should be provided via the relatedItem field.)
- TableOfContents
- Other
Example
descriptions>
<description xml:lang="en" descriptionType="Abstract">
<
The “Kritische Ausgabe der Werke von Richard Strauss”, a
long-term editorial project, has been under way at the
Institut für Musikwissenschaft of the Ludwig-Maximilians-
Universität Munich since 2011; it is directed by ...description>
</description xml:lang="de" descriptionType="Abstract">
<
Das Langzeit-Editionsprojekt „Kritische Ausgabe der Werke von
Richard Strauss“ wird seit Februar 2011 unter der Leitung von
Prof. Dr. Hartmut Schick am Institut für Musikwissenschaft der
Ludwig-Maximilians-Universität München ... description>
</description xml:lang="en" descriptionType="Methods">
<
digital editing, software/application developmentdescription>
</descriptions> </
18 geoLocation [r]
DataCite documentation [external link]
- This field is recommended where applicable.
- Describes the resource (e.g. where an image has been taken or where a sensor is located), not the related project or institute, if the former is not applicable, do not use it for the latter.
- geoLocationPlace must be identical to corresponding GeoNames field in the subjects, consult the geotagging subsection.
- Canonical source for coordinates is the GeoNames Service [external link].
Examples
- geoLocationPlace and geoLocationPolygon:
geoLocations>
<geoLocation>
<geoLocationPlace>Höslwang</geoLocationPlace>
<geoLocationPolygon>
<polygonPoint>
<pointLatitude>47.9231796264648</pointLatitude>
<pointLongitude>12.2860469818115</pointLongitude>
<polygonPoint>
</polygonPoint>
<pointLatitude>47.9231796264648</pointLatitude>
<pointLongitude>12.3512439727784</pointLongitude>
<polygonPoint>
</polygonPoint>
<pointLatitude>47.9707412719727</pointLatitude>
<pointLongitude>12.3512439727784</pointLongitude>
<polygonPoint>
</polygonPoint>
<pointLatitude>47.9707412719727</pointLatitude>
<pointLongitude>12.2860469818115</pointLongitude>
<polygonPoint>
</polygonPoint>
<pointLatitude>47.9231796264648</pointLatitude>
<pointLongitude>12.2860469818115</pointLongitude>
<polygonPoint>
</geoLocationPolygon>
</geoLocation>
</geoLocations> </
- geoLocationPlace and geoLocationBox:
geoLocations>
<geoLocation>
<geoLocationPlace>Hall in Tirol</geoLocationPlace>
<geoLocationPoint>
<pointLongitude>11.51667</pointLongitude>
<pointLatitude>47.28333</pointLatitude>
<geoLocationPoint>
</geoLocationBox>
<westBoundLongitude>11.5272636413574</westBoundLongitude>
<eastBoundLongitude>11.4707803726196</eastBoundLongitude>
<southBoundLatitude>47.2697830200196</southBoundLatitude>
<northBoundLatitude>47.2893867492676</northBoundLatitude>
<geoLocationBox>
</geoLocation>
</geoLocations> </
19 fundingReference [o]
DataCite documentation [external link]
- This field is optional.
- This is the place to add information about the project and its funding.
- funderName is mandatory, if fundingReference is used. For usage see: How should I Specify an Institution.
- Use Cordis (EU) [external link], GEPRIS (DFG) [external link], FWF (Austria) [external link] et al. for identification of grants.
- awardTitle is the name of the grant, not the funding line or funding program.
Example
fundingReferences>
<fundingReference>
<funderName>Deutsche Forschungsgemeinschaft (DFG)</funderName>
<funderIdentifier funderIdentifierType="ROR">
<
https://ror.org/018mejw64funderIdentifier>
</awardNumber
< awardURI="http://gepris.dfg.de/gepris/projekt/253900505">
253900505awardNumber>
</awardTitle xml:lang="de">
<
VerbaAlpina. Der alpine Kulturraum im Spiegel seiner
MehrsprachigkeitawardTitle>
</fundingReference>
</fundingReferences> </
C. Examples
Digital encyclopedia: “Bayerisches Musiker-Lexikon Online”
The Digital Encyclopedia of Bavarian Musicians (Bayerisches Musiker-Lexikon Online, short BMLO) refers as a musicological model project. The BMLO offers a digital biographical dictionary focussing on music science, furthermore it enriches the presented personalities of Bavarian history of music by implementing further information, gathered from biographical literature, archives, libraries and digital collections. In this way, the BMLO constitutes the core of an interconnected, virtual cluster for history of music. Currently, 24621 out of a total of 27818 records is presented on the web. As parts of this semantic network should also be mentioned the Munich Dictionary of Musik (Münchner Musiklexikon, short MUK), which serves since 2010 as a encyclopedia for music corporations with a linkage to Munich, as well as LOCI, a geographic database for music, culture and history, founded in 2012.
Meteorological project: “ClimEx”
The ClimEx project investigates the effects of climate change on meteorological and hydrological extreme events and implications for water management in Bavaria and Québec. It especially consists of two new aspects:
An ensemble of 50 transient runs of the canadian general circulation model CanESM2 (~200km resolution) from 1950 to 2100, resulting in 7500 years of modelled climate. As each of these runs is initialized with only slightly altered starting conditions, this ensemble can be interpreted as (modelled) natural variability. The CanESM2 then drives the regional climate model CRCM5 (~11km resolution) for a domain that covers most of central Europe. Both models are internationally established and widely used in the climate science community.
A physically based hydrological model (WaSiM) is driven by this climate input for the entire hydrological Bavaria at very high temporal and spatial resolution of 3 hours and 500m to investigate both climate change impacts and natural variability of extreme events, especially floods.
ClimEx further strengthens the international collaboration between Bavaria and Québec as research facilities, universities and public water agencies intensify their former cooperation approaches.
Volume as part of a series: “Discourses on Corruption”
The volume “Discourses on Corruption. Interdisciplinary and Intercultural Perspectives” is part of the series “Politics and Society in India and the Global South”, a collaboration of Sage Publications and the M.S. Merian – R. Tagore International Centre of Advanced Studies ‘Metamorphoses of the Political’ (ICAS:MP). ICAS:MP is an Indo-German research collaboration of six Indian and German institutions funded by the German Federal Ministry of Education and Research (BMBF). Located in New Delhi, ICAS:MP critically intervenes in global debates in the social sciences and humanities. This volume, through case studies, investigates corruption in the Global South (especially India and Brazil) and West (especially Switzerland) to gain a more nuanced view of the phenomenon.
Article in a conference proceeding: “High-Energy Physics”
The article summarizes the contents of a talk given at the Workshop on Exclusive Reactions at High Momentum Transfer, 21-24 May 2007, in Newport News, VA, United States. The article deals with the radiative decay of delta baryons to nucleons and an on-shell photon. The form factors of the decay are dertermined using QCD light-cone sum rules and photon distribution amplitudes. The proceeding was published as an edited volume in printed and electronic form. Furthermore, there are several works that are closely related to the talk: a journal publication and a preprint version of the article. These have been referenced in the metadata.
Critical editon (digital & print): “Richard Strauss Kritische Werkausgabe”
The “Kritische Ausgabe der Werke von Richard Strauss”, a long-term editorial project, has been under way at the Institut für Musikwissenschaft of the Ludwig-Maximilians-Universität Munich since 2011; it is directed by Prof. Dr. Hartmut Schick and is supervised by a project committee and advisory board constituted by the Bayerische Akademie der Wissenschaften. The project is part of the so-called Akademienprogramm, financed jointly by Germany’s federal government and federal states. Collaborations are in place with the Richard-Strauss-Institut in Garmisch Partenkirchen – which, between 2009 and 2012, prepared the “Richard-Strauss-Quellenverzeichnis”, funded by the Deutsche Forschungsgemeinschaft – with the IT-Gruppe Geisteswissenschaften at the LMU Munich, and the Richard-Strauss-Archiv at Garmisch Partenkirchen, run by the composer’s family.
Digital lexicographical information system: “VerbaAlpina”
The project VerbaAlpina [external link], funded by the DFG from 2014 to 2023 as a long-term project [external link], was dedicated to the documentation of dialectal lexical variation in the Alpine region within concept domains typical of the region. An independent goal of the project was to use digital methods as extensively and consistently as possible. Problems resulting from complete digitality were discussed, reflected upon, documented and, where possible, solutions were found. Essentially, VerbaAlpina is a lexicographical information system that unites the two traditionally mutually exclusive publication genres of “dictionary” and “language atlas” in one. The results of the VerbaAlpina project are available exclusively in electronic form. Essentially, this is the highly granularly structured lexicographical data material, as well as explanatory texts of various orientations and software components developed as part of the project. The aforementioned electronic legacies of VerbaAlpina were treated in accordance with the FAIR criteria. In the course of this, data was also transferred to the research data repository [external link] of the LMU University Library.
VerbaAlpina metadata: example full data set [external link]; example individual data set [external link]