Object Storage

Object Storage offered by EGI Cloud providers

What is it?

Object storage is a standalone service that stores data as individual objects, organized into containers. It is a highly scalable, reliable, fast, and inexpensive data storage. It has a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.

The main features of object storage:

  • Storage containers and objects have unique URLs, which can be used to access, manage, and share them.
  • Data can be accessed from anywhere, using standard HTTP requests to a REST API (e.g. VMs running in the EGI Cloud or in other cloud provider’s cloud, from any browser/laptop, etc.)
  • Access can be public or can be restricted using access control lists.
  • There is virtually no limit to the amount of data you can store, only the space used is accounted for.

Concepts

To use object storage effectively, you need to understand the following key concepts and terminology:

Storage containers

Storage containers (aka buckets) are the fundamental holders of data. Every object is stored in a storage container. You can store any number of objects in a storage container.

Storage containers have an unique name and act as the root folders of the storage space.

Each storage container has a unique URL (that includes the name) by which anyone can refer to it.

Objects

Objects are the fundamental entities stored in object storage. Objects consist of object data and metadata. The data portion is opaque to object storage. The metadata is a set of name-value pairs that describe the object. These include some default metadata, such as the date last modified, and standard HTTP metadata, such as Content-Type. You can also specify custom metadata at the time the object is stored.

An object is uniquely identified within a storage container by a key (name) and a version.

Each object has a unique URL, based on the storage container’s URL (that includes the key, and optionally the version) by which anyone can refer to it.

Permissions

Storage containers and objects can be shared by sharing their URLs. However, access to a storage container or to an object is controlled by access control lists (ACLs). When a request is received against a resource, object storage checks the corresponding ACL to verify that the requester has the necessary access permissions.

Usage from your application

The object storage in the EGI Cloud is offered via OpenStack deployments that implement the Swift service.

Users can manage object storage using the OpenStack Horizon dashboard of a provider or from the command-line (CLI). More advanced usage include access via the S3 protocol, via the OpenStack Object Store API, or using the EGI Data Transfer service.

Access from the command-line

Multiple command-line interfaces (CLIs) are available to manage object storage:

  • The OpenStack CLI
  • The FedCloud Client is a high-level CLI for interaction with the EGI Federated Cloud (recommended)
  • The Swift CLI has some advanced features that are not available through the OpenStack CLI

Access with the FedCloud CLI

The main FedCloud commands for managing storage containers and storage objects are described below.

List storage containers

For example, to access to the SWIFT endpoint at IFCA-LCG2 via the Pilot VO (vo.access.egi.eu), and list the available storage containers, use the FedCloud command below:

To avoid passing the site, VO, etc. each time, you can use FedCloud CLI environment variables to set them once and reuse them with each command invocation.

$ export EGI_SITE=IFCA-LCG2
$ export EGI_VO=vo.access.egi.eu
$ fedcloud openstack container list --site $EGI_SITE
+------------------+
| Name             |
+------------------+
| test-egi         |
+------------------+

To avoid passing the site, VO, etc. each time, you can use FedCloud CLI environment variables to set them once and reuse them with each command invocation.

> set EGI_SITE=IN2P3-IRES
> set EGI_VO=vo.access.egi.eu
> fedcloud openstack container list --site %EGI_SITE%
+------------------+
| Name             |
+------------------+
| test-egi         |
+------------------+

To avoid passing the site, VO, etc. each time, you can use FedCloud CLI environment variables to set them once and reuse them with each command invocation.

> $Env:EGI_SITE="IN2P3-IRES"
> $Env:EGI_VO="vo.access.egi.eu"
> fedcloud openstack container list --site $Env:EGI_SITE
+------------------+
| Name             |
+------------------+
| test-egi         |
+------------------+

Create new storage container

To create a new storage container named test-egi, use the follwoing FedCloud command:

$ fedcloud openstack container create test-egi
+---------+-----------+------------------------------------------------------+
| account | container | x-trans-id                                           |
+---------+-----------+------------------------------------------------------+
| v1      | test-egi  | tx000000000000000000afc-005f845160-2bb3ed4-RegionOne |
+---------+-----------+------------------------------------------------------+

Create new object by uploading a file

To upload a file as a new object into a storage container named test-egi, use the following FedCloud command:

$ fedcloud openstack object create test-egi file1.txt
+-----------+-----------+----------------------------------+
| object    | container | etag                             |
+-----------+-----------+----------------------------------+
| file1.txt | test-egi  | 5bbf5a52328e7439ae6e719dfe712200 |
+-----------+-----------+----------------------------------+

List objects in a storage container

To list the objects in a storage container use the FedCloud command below:

$ fedcloud openstack object list test-egi
+-----------+
| Name      |
+-----------+
| file1.txt |
+-----------+

Download (the content of) an object

To download an object named file1.txt located in storage container test-egi, and save its content to a file use the FedCloud command below:

$ fedcloud openstack object save test-egi file1.txt

Add metadata to an object

You can add/update object metadata, stored as key-value pairs among the object properties. E.g. to add a property named key1 with the value value2 to an object named file1.txt located in the storage container named test-egi, you can use the FedCloud command below:

$ fedcloud openstack object set \
      --property key1=value2 test-egi file1.txt

Remove metadata from an object

You can also remove metadata from objects. E.g. to remove the property named key1 from the object named file1.txt located in the storage container named test-egi, you can use the FedCloud command below:

$ fedcloud openstack object unset \
      --property key test-egi file1.txt

Remove an object from a storage container

To delete an object named file1.txt from the storage container test-egi, use the following FedCloud command:

$ fedcloud openstack object delete test-egi file1.txt

Removing an entire container

To delete a storage container, including all objects in it, use the FedCloud command below.

$ fedcloud openstack container delete test-egi

Access via Rclone

Rclone is a command-line program to manage files on cloud storage. This section explains how to use rclone to interact with OpenStack Swift available in the EGI Federated Cloud.

As a prerequisite, we need to configure the following environment variables: OS_AUTH_URL, OS_AUTH_TOKEN, OS_STORAGE_URL. Use the FedCloud Client to get their values:

# explore sites with swift storage
$ fedcloud endpoint list --service-type org.openstack.swift --site ALL_SITES

# get OS_AUTH_URL
$ fedcloud openstack --site <site> --vo <virtual-organisation> catalog show keystone

# get OS_AUTH_TOKEN
$ fedcloud openstack --site <site> --vo <virtual-organisation> token issue \
  -c id \
  -f value

# get OS_STORAGE_URL for your site and Virtual Organisation
$ fedcloud openstack --site <site> --vo <virtual-organisation> catalog show swift

Now configure rclone to work with the environment variables:

$ rclone config create egiswift swift env_auth true

Finally, check that you have access to swift:

$ export OS_AUTH_TOKEN=<token>
$ export OS_AUTH_URL=<keystone-url>
$ export OS_STORAGE_URL=<swift-url>
$ rclone lsd egiswift:

For more information, please see Rclone documentation for Swift.

Access via the S3 protocol

The OpenStack Swift service is compatible with the S3 protocol, therefore when properly configured, it can be accessed as any other S3-compatible object store.

In order to access the storage via S3, an EGI Federated Cloud site admin needs to create and associate to your EGI credentials both access and secret keys which could then be used by clients to have access to the storage.

AWS CLI

The AWS CLI can be used to manage object storages having S3 interface.

First of all the configuration of the access and secret keys need to be done:

$ aws configure

then it offers many commands to list, create buckets, objects, e.g.:

$ aws s3 ls --no-sign-request \
  --endpoint-url https://object-store.cloud.muni.cz \
  s3://test-egi-public

Minio Client

The MinIO CLI supports filesystems and Amazon S3 compatible cloud storage services.

It offers a modern alternative to UNIX commands like ls, cat, e.g.:

# key and secret are not mandatory in case of public buckets
$ ./mc alias set cesnet https://object-store.cloud.muni.cz

$ ./mc ls cesnet/test-egi-public

$ ./mc cat cesnet/test-egi-public/file1.txt

Davix

The Davix Client, developed at CERN for RHEL and Debian environments, is another alternative for working with S3-compatible object storage.

For example, to list containers/objects via the S3 protocol, use the command:

$ davix-ls --s3accesskey 'access' --s3secretkey 'secret' \
  --s3alternate s3s://s3.cl2.du.cesnet.cz/<bucket-name>

davix-get, davix-put and davix-del are also available to download, store and delete objects from the storage.

Access via Python

The possibility to access progammatically via S3 object storage is also quite important, for instance in the case of interactive computing via EGI Notebooks.

When using Python for instance, S3Fs is a practical Pythonic file interface to S3.

The top-level class S3FileSystem holds connection information and allows typical file-system style operations like cp, mv, ls, du, glob, etc., as well as put/get of local files to/from S3.

import s3fs

fs = s3fs.S3FileSystem(anon=True,
      client_kwargs={
         'endpoint_url': 'https://object-store.cloud.muni.cz'
      })

print(fs.ls('s3://test-egi-public'))
s3path = 's3://test-egi-public/file1.txt'

with fs.open(s3path, 'rb') as f:
    print(f.read())

There is a good collection of examples on the S3Fs GitHub repository.

Access via EGI Data Transfer

The EGI Data Transfer service can move files to and from object storages that are compatible with the S3 protocol. You will have to upload the access keys to the EGI Data Transfer service, which will be able to generate properly signed URLs for the objects in the storage.

You can then refer to this tutorial to see how to transfer to/from an Object storage endpoint.