Science data – support for researchers

SURF Data Archive

The Surf Data Archive (SDA) is a tape library for secure data archiving and long-term storage of large volumes of data that are not actively in use.

👤For whom?  This service is for researchers and research support staff.
🔑Access Access can be requested at the faculty via rdm-beta@uu.nl.
After initial admission with SolisID, you log in to the data archive via Secure Shell (SSH) and manage your data via the command-line.
🔐Security The archive is ISO 27001 certified, with libraries in two physically separate locations in the Amsterdam and Haarlemmermeer municipalities.
Always use secure protocols and encrypt sensitive data.
🧮Costs Storage is charged – via WBS -for a 5 year minimum usage and can be financed per year or up front. Rate per 10 TB, for 5 years is EUR 745 (2025 rates)
🌱Energy footprint Approximately 1/10th of the electricity consumption compared to regular disc-based network storage (0.44 watt per TB for mirrored storage)

 

REQUEST SURF Data Archive

Important! SDA is designed for long-term storage of large volumes of static research data (data not requiring frequent access).
For storing active (volatile) data during your project, consider alternative storage solutions.
 

 

 

Data management in SDA

Key considerations

🗂️File bundling:
SDA is optimized for high-volume storage. Archiving data in SDA  requires an average file size above 1GB. You can  use dmftar for bundling files.
📦Data retrieval:
SDA is optimized for infrequent access; ensure files don’t require regular retrieval. If retrieving data from the archive, note that users are required to stage their data before retrieval; some latency for this process should be expected.

📃Data documentation:
Include metadata or documentation files to ensure data remains usable and FAIR-compliant. Use clear naming conventions for folder/file naming and indicate in the metadata whether there is defined retention period for archiving.

🔩Data transfer tools:
SDA supports different data transfer protocols  – SSH, SFTP, rsync, FTP(E)S, GridFTP – that work in both Linux and Windows.  Files must be compatible for transfer with these protocols. Note also that data transfers are performed via command-line interfaces over SSH, so familiarity with these tools is necessary.  

SDA is designed for long-term storage of large volumes of static research data. Here’s a concise guide on which files are suitable for storage on SDA:

Suitable Files for SDA:

  1. Static Research Data –  Data that is no longer actively modified, such as finalized datasets from completed projects or project phases.
  2. Large-Scale Datasets – Files or datasets that are large in size (terabytes to petabytes), as SDA is optimized for high-volume storage.
  3. FAIR-Compliant Data – Data that aligns with FAIR principles (Findable, Accessible, Interoperable, Reusable), such as well-documented datasets with metadata for research sharing or publication.
  4. Non-Sensitive or Anonymized Data – Data that is anonymized or approved for long-term archiving, ensuring compliance with privacy regulations (e.g., GDPR).
  5. Formats for Longevity – Files in stable, widely supported formats to ensure future accessibility.

Files NOT Suitable for SDA:

  1. Active/Volatile Data
  2. Small or Temporary Files – SDA is not cost-effective for small datasets or short-term storage. Use other solutions for files under a few terabytes unless part of a larger archive.
  3. Sensitive Data Without Proper Handling – Unanonymized personal data or sensitive information requiring strict access controls may need additional measures or alternative storage (e.g., Yoda for sensitive data management).
  4. Backups -SDA is an archive, not a backup system. It doesn’t provide automated backups, so users must maintain their own backups elsewhere.

Where to Find More Guidance:

 
The SDA host is archive.surfsara.nl
SDA supports secure file transfer protocols for data upload and retrieval.

1. Choose a Secure Connection Protocol

    • SFTP: Secure File Transfer Protocol, widely used for secure data transfers.
    • rsync: Useful for synchronizing files or transferring large datasets efficiently.
    • GridFTP: Suitable for high-performance data transfers, often used in scientific research (e.g., for LHC data).

2. Set Up Your Environment SSH Access: You’ll need an SSH client (e.g., OpenSSH on Linux/macOS or PuTTY on Windows) to connect to the Data Archive’s servers.

    • Encryption for Sensitive Data: If your data is sensitive, encrypt it before transfer using tools like GPG or OpenSSL. This ensures that even system administrators cannot access the data’s contents.
    • File Preparation: Ensure files meet the archive’s requirements:File sizes should ideally be between 1 GB and 200 GB.
    • Bundle small files (<1 GB) into larger archive files using tools like dmftar.
    • Split very large files (>1 TB) into smaller chunks for optimal performance.

3. Obtain Server Details:

    • SURF will provide you with the hostname, port, and credentials for the Data Archive server after your account is set up.

4. Connect to the Data Archive:

SFTP: bash  sftp username@archive.surf.nl
After logging in, use commands like put or get to upload or download files.

SCP:bash scp /local/path/to/file username@archive.surf.nl:/project/path/

rsync:bash rsync -av /local/path/to/file username@archive.surf.nl:/project/path/

GridFTP: Requires specific client software like Globus Toolkit. SURF provides documentation for GridFTP setup upon request.

Authentication: Use your SURF-provided credentials or SSH keys for authentication. Ensure your SSH key is added to the server’s authorized keys if required.

Where to find more guidance: