Grid Storage

Grid Storage offered by EGI HTC providers

What is it?

Grid storage enables storage of files in a fault-tolerant and scalable environment, and sharing it with distributed teams. Your data can be accessed through multiple protocols, and can be replicated across different providers to increase fault-tolerance. Grid storage gives you complete control over what data you share, and with whom you share the data.

The main features of grid storage:

  • Access highly-scalable storage from anywhere
  • Control the data you share
  • Organise your data using a flexible, hierarchical structure

Grid storage file access is based on the gridFTP and WebDav/HTTP protocols, together with XRootD and legacy SRM (under deprecation at some of the endpoints).

Several grid storage implementations are available in the EGI Infrastructure, the most common being:

Endpoint discovery

The grid storage endpoints that are available to a user’s Virtual Organizations are discoverable via the EGI Information System (BDII).

The lcg-infosites command can be used to obtain VO-specific information on existing grid storages, using the following syntax:

$ lcg-infosites --vo voname -[v] -f [site name] [option(s)] [-h| --help] [--is BDII]

For example, to list the Storage Elements (SEs) available to the biomed VO, we could issue the following command:

$ lcg-infosites --vo biomed  se

 Avail Space(kB)  Used Space(kB)  Type  SE
    280375465082             n.a  SRM
     10995116266              11  SRM

Access from the command-line

Access to grid storage via a command-line interface (CLI) requires users to obtain a valid X.509 user VOMS proxy. Please refer to the Check-in documentation for more information.

The CLI widely used to access grid-storage is gfal2, which is available for installation both on RHEL and Debian compatible systems.

In particular, gfal2 provides an abstraction layer on top of several storage protocols (XRootD, WebDAV, SRM, gsiftp, etc), offerint a convenient API that can be used over different protocols.

The gfal2 CLI can be installed as follows (for RHEL compatible systems):

$ yum install gfal2-util gfal2-all

where gfal2-all will install all the plug-ins (to deal with all the available protocols).

Below you can find examples of the usual commands needed to access storage via gfal2. For a complete list of available commands, and the guide on how to use them, please refer to the gfal2-util documentation.

List files on a given endpoint

$ gfal-ls gsi

Create a folder

$ gfal-mkdir gsi

Copy a local file

$ gfal-copy test.json gsi
Copying file:///root/Documents/test.json   [DONE]  after 0s

Copy files between storages

$ gfal-copy gsi gsi
Copying gsi   [DONE]  after 3s

Download a file to a local folder

$ gfal-copy gsi /tmp
Copying gsi   [DONE]  after 0s

Delete a file

$ gfal-rm gsi
gsi      DELETED

Access via EGI Data Transfer

The EGI Data Transfer service provides mechanisms to optimize the transfer of files between EGI Online Storage endpoints. Both a graphical user interface (GUI) and command-line interfaces (CLI) are available to perform bulk movement of data. Please check out the related documentation for more information.

Integration with Data Management frameworks

Grid storage access, most of the time, is hidden from users by the integration with the Data Management Frameworks (DMFs) used by Collaborations and Experiments.

For example, EGI Workload Manager provides a way to efficiently access grid storage endpoints in order to read/store files, and to catalogue the existing file and related metadata.

When running computation via the EGI Workload Manager, users do not actually access the storage directly. However, users can retrieve the output of the computation once it has been stored on the grid.