DataHub Use-Cases
Use-cases for EGI DataHub
EGI DataHub is a high-performance data management solution that offers unified data access across globally distributed environments and multiple types of underlying storage. It allows researchers to share, collaborate and perform computations on the stored data easily.
Users can bring data close to their community or to the compute facilities they use, in order to exploit it efficiently. This is as simple as selecting which (subset of the) data should be available at which supporting provider.
The main features of DataHub are:
EGI DataHub supports multiple access policies:
Data replication in EGI DataHub may take place either on-demand or automatically. Replication uses a file catalogue to enable tracking of logical and physical copies of data.
The following concepts (components) will help you understand how EGI DataHub works.
Virtual volume where users will organize their data. A space is supported by one or more Oneproviders that provide the actual storage resources.
Data management component deployed in the data centres, provisioning data and managing transfers. A Oneprovider is typically deployed at a site near the local storage resources, and can access local storage resources over multiple connectors (e.g. CEPH, POSIX). A default one is operated for EGI by CYFRONET.
Central component for federating providers. It takes care of Authentication and Authorization and other management tasks (like space creation). EGI DataHub is a Onezone instance.
The central Onezone instance of the EGI Federation. Single Sign On (SSO) with all the connected storage providers (Oneprovider) is guaranteed through EGI Check-in
Client application providing access to the spaces through a Linux FUSE mount point (local POSIX access), as if they were part of the local file system. Oneclient can be used from VMs, containers, desktops, etc.
Using the EGI DataHub web interface it's possible to manage the space.
Using Oneclient it's possible to mount a space locally, and access it over a POSIX interface, using files as they were stored locally. The file's blocks are downloaded on demand.
In Onedata the file distribution is done on a block basis, blocks will be replicated on the fly, and it's possible to instrument the replication between Oneproviders.
Three different formats of metadata can be attached to files: basic (key-value), JSON and RDF. The metadata can be managed using the Web interface and the APIs. It's also possible to create indexes and query them.
It's possible to view the popularity of a file and manage smart caching.
The EGI DataHub Web interface can be accessed by any user authenticated via
EGI Check-in. Users authenticated have access to the
PLAYGROUND
space, with a 30 GB quota, where they can tests some of the available
features. Users have also read-only access to the open-datasets
space where
some example datasets are stored.
In order to access via oneclient or API please check the related documentation.
Advanced users willing to install their own Oneprovider can check the dedicated installation instructions.
Use-cases for EGI DataHub
Clients for accessing data EGI DataHub
File Management in DataHub
The programmatic interface of EGI DataHub
Links to additional DataHub resources