Data Management

Data management services in the EGI infrastructure

Overview

The data management services of EGI comprises two groups of services:

  • Services that provide data management capabilities to enhance the raw storage available in the EGI infrastructure
  • Specialized services that offer advanced organisation of data during ongoing research projects, as an integrated environment with data management and digital lab notebook

The EGI data management services offer both application programming inferfaces (APIs) and command-line interfaces (CLIs) that are integrated with the advanced EGI services and platforms (such as development environments, machine learning, or cloud orchestrators), and can be accessed from most compute services.

Generic data management

These higher-level data management services are available to researchers:

  • EGI Rucio is tailored to medium/big scientific collaborations, allowing users to organise, manage, and access their data at scale. Data can be distributed across heterogeneous data centers at widely distributed locations.
  • EGI DataHub is a high-performance data management solution that offers unified data access across multiple types of underlying storage, allowing users to share, collaborate and easily perform computations on the stored data.

Specialized data management

The following specialized data management services are also available:

  • EGI Data Transfer is a low-level service to move data from one Grid or Object storage to another. It is used internally by Rucio to schedule transfers based on the data policies defined by the users.
  • openRDM is a combined FAIR data management platform, Electronic Laboratory Notebook (ELN) and Inventory Management System allowing a complete overview of workflows and information, from initial data generation to data analysis and publication.

Use-cases for storing and managing research data

Depending on the type of the employed compute services and the use-cases being addressed, users might need to choose different data service to store, access, and manage data.

UserData storageData management (optional)
Cloud userBlock and Object storageDataHub
HTC userGrid storageRucio
HPC userHigh-performance parallel file systems or Object storageDataHub or Rucio

The following sections offer detailed descriptions for each data management service.


Next topics:
EGI DataHub

Discover, manage, and replicate data with EGI DataHub

EGI Data Transfer

Very large data transfers in the EGI infrastructure

Rucio

Organise and access data at scale with Rucio

openRDM

Organise data in research projects with openRDM

Last modified February 23, 2022 by Levente Farkas : Update user documentation structure (#414)