Metadata is structured documentation that captures information about the data, making it easier to discover by search engines.
Metadata may exist both at a data object level and at a project or database level.
Though there may be differences per discipline, the list below provides a general guidance on which information should be included in a metadata form (based on the Dublin Core scheme). The more information is filled in, the easier it will be to find your data(set).
1. Title
2. Creator of the data
3. Description of data objects
4. Format of data files in the dataset
5. Identifier (e.g.DOI)
6. Provenance and contextual information
7. Language and keywords
8. Related datasets and other resources, such as publications, websites, software
9. Temporal and spatial coverage (where relevant)
10. Rights and access information (including license)
Good to know:
– To make sure you are in line with the international standards regarding metadata content and format, it is recommended that you use a metadata scheme or standard.
– Here you can find an example of the Yoda metadata form.
Following community-endorsed standards for your metadata, ensures that others can clearly understand and (re)use your data., facilitating replication studies or meta-analyses.
To find discipline-specific metadata standards see Metadata Standards Catalog (bath.ac.uk)
Good to know:
When you submit your dataset in a (trusted) data repository, machine-readable metadata will be added according to the repository chosen standard.
For example, DANS and Dataverse NL use the Dublin Core metadata standard. At the UU, Yoda uses the DataCite metadata standard.
To increase findability, the (meta)data should be registered or indexed in a searchable resource. Depositing your data in a certified repository guarantees not only indexation but also that (meta)data are available and findable over the long-term. Ideally, these repositories are indexed by search engines (such as Google).
Some examples of repositories
Generic repositories: Figshare, Dryad, Zenodo
Domain-specific repositories: ioChem-BD, NoMaD Repository, HEP Data, BioModels Database, 3DEM, PRIDE
UU Institutional repository: Yoda
Good to know:
– The UU Data Repository Finder can help you choose a generic data repository.
– Check re3data.org to find discipline-specific repositories.
– The generic repository DataverseNL is available to all UU employees via their SolisID.
– To choose a repository you may also follow the guidelines of your publisher. Some publishers provide lists of recommendations (e.g. Nature | Repository Guidance)
Metadata are valuable in and of themselves. When data cannot be made available (due to for example data privacy concerns) or after data is no longer available, metadata should be retrievable ensuring that others can discover and cite your work.
Moreover, even if the original data are missing, knowing of the existence of a dataset and tracking down people or publications associated with the original research can be valuable for others.
Read more on sharing metadata as alternative to sharing personal data
In some cases new data is related to already existing data (e.g. dataset builds on another data set).
When additional data resources are needed to complete the data, or if contextual knowledge about the data is stored in a different dataset, the links between the datasets should be described and all related entities should be properly cited in the dataset metadata or documentation.
To let other researchers make use of your dataset, it is essential to clearly explain how, why and by whom the data have been created and processed.
This provenance information can consist of the description of the origin and processing of the data; how it was collected or created, processed and altered. If you are reusing data you should add the citation of the original dataset (including, where available, a DOI or other persistent identifier).
Learn more about metadata on Metadata and Documentation | RDM Support UU