Science data – support for researchers

Quick Guide: Data Storage Clean Up

Why is this important?

With more and more digital data being generated in research, there is a growing need to ensure that data storage spaces are used efficiently. A regular clean up of  the project storage will not only facilitate findability, reproducibility and sustainability of the data but will also help to keep data preservation costs more manageable.

The goal of a data storage cleanup is to free up storage space and enhance data organization by removing redundant or unnecessary files, archiving important data and ensuring compliant and well documented data are kept in storage.

Quick Guide

Indications below offer a general guideline to start organizing your project storage space.

Quick Guide Explained

Review and appraise the data in your project directory. Identify data files in active use, candidates for long-term retention or deletion.

 

             Redundant or temporary files

  • Are data files duplicated?
  • Are data files intermediate or temporary files irrelevant for the project result?
  • Are data files debugging, testing or placeholder data irrelevant for the project result?

            Format or usability issues

  • Are data stored in unreadable, broken or corrupt formats?
  • Are data incomplete or of low-quality, rendering it unusable?
  • Are that data files incomprehensible due to lack of documentation?

 

 

If yes: you may consider removing these files from your storage.

 

If yes: consider improving the file quality or adding documentation where possible.

(The support team can help you recover data from outdated file formats or help you set-up research documentation.)

Where not possible you may want to consider removing these files from your storage.

              Legal aspects

  • Is storage of the data no longer required by policy?
    Check the data retention period. Verify also whether your funder or journal have specific requirements for data preservation
  • Is storage or deletion of the data needed to comply legal or contractual obligations?
    Check agreements with third parties (e.g. data sharing agreements, consortium or commercial collaborations, patent applications) for obligations regarding the data, as these might restrict the (long-term) storage of data. 

 

            Copyright and usage rights
           (data obtained from or with 3rd parties )

  • Is storage of the data allowed by copyright/license?
  • What are the terms of use of the data and what are the provisions regarding (long-term) storage?
    Rights to store the data might be limited when you are an user of data mainly managed by others (e.g. data from public archives or agencies ).
    Check always what are your rights to the data and whether (long-term) storage is allowed.

             Data protection  

  • Are you storing personal data and are there GDPR considerations that limit the storage of data files?
    Check whether you have a valid legal basis to store the data. Consider data minimization criteria and keep an eye on how long data should be stored.
  • If working with personal or otherwise sensitive data: Are there ethics considerations that could limit the storage of data files?

 

 

Make sure you are retaining data for as long as it is required. Where no retention obligations apply, data might be a candidate for deletion.

Note that, even without obligations, data might still be relevant to preserve. Always check the scientific value or reuse potential of the data before considering deletion! 

 

 

 

 

Only store data you are allowed to keep.
If storage/archiving of data is not allowed, you should consider removing these files from your storage or discussing further with the copyright holder/interested parties.

 

 

 

 

If you are storing personal data that you should not be retaining, you may need to consider removing these files from your storage or taking an additional action, such as pseudonymising the data.

Always ensure you are storing personal (or otherwise sensitive data) data in a secure environment and in a secure way.

 

            Scientific Value

  • Supports published or submitted findings
  • Necessary for reproducibility or validation of research
  • Unique, costly or hard–to-reproduce data

             Reuse potential

  • Potential secondary use and analysis for research
  • Teaching and learning use
  • Linked to tools or code that support reuse

            Other considerations

  • Policy, public or community value
  • Private or commercial value (e.g patent application)

 

 

Even when not required by policy or other obligations, data may be valuable to preserve.

Decision-making regarding which data is valuable to keep for the long-term can be challenging. Each discipline will have tailored criteria regarding the value of data files. You might need to develop specific criteria for data selection within your group/division.

Are data to be archived?
VAULT IT!

Archive finalized, published or reproducible datasets.

 

Are data to be destroyed?
Destroy/delete safely

Remove unnecessary , irrelevant or non-compliant data.

  • Ensure you carry out any deletion or removal of data in a secure manner.
  • Use secure tools to ensure complete deletion where needed (sensitive data).
  • Where needed, document what was deleted, by whom and why (make sure you record the disposal of the data in alignment with disciplinary standards and responsible research guidelines).

 

You may need to revisit the data at a later stage to decide whether data still meets the conditions to be kept in storage or to develop a criteria for data selection within your discipline.

Perform data triage and cleanup periodically, or (a) at the end of a project phase, (a) after publication submission, (c) before project closure or researcher departure.