DataHub API

The Application Programming Interfaces of EGI DataHub

Most operations in EGI DataHub can be performed using one of the OneData Applicaton Programming Interfaces (APIs).

Getting an API access token

Tokens have to be generated from the EGI DataHub (Onezone) interface as documented in Generating tokens for using Oneclient or APIs or using a command-line call as documented hereafter.

Bear in mind that a single API token can be used with both Onezone, Oneprovider and other Onedata APIs.

It’s possible to retrieve the CLIENT_ID and REFRESH_TOKEN using the EGI Check-in Token Portal. See Check-in documentation for more information.

CLIENT_ID=<CLIENT_ID>
REFRESH_TOKEN=<REFRESH_TOKEN>
# Retrieving an OIDC token from Check-in
curl -X POST \
  -d "client_id=$CLIENT_ID&grant_type=refresh_token&refresh_token=$REFRESH_TOKEN&scope=openid%20email%20profile%20eduperson_entitlement" \
  'https://aai.egi.eu/oidc/token' | python -m json.tool;
# Token is in the access_token field of the response

The following variables should be set:

  • OIDC_TOKEN: OpenID Connect Access token.
  • ONEZONE_HOST: name or IP of the Onezone host (to use Onezone API).
ONEZONE_HOST=https://datahub.egi.eu
OIDC_TOKEN=<OIDC_ACCESS_TOKEN>
curl -H "X-Auth-Token: egi:$OIDC_TOKEN" -X POST \
  -H 'Content-type: application/json'  \
  "$ONEZONE_HOST/api/v3/onezone/user/tokens/named" -d '{ 
  "name": "REST and CDMI access token", 
  "type": { 
    "accessToken": {} 
  }, 
  "caveats": [ 
    { 
      "type": "interface", 
      "interface": "rest" 
    } 
  ] 
}'

Data access via CDMI and REST API

Below are example commands to learn how to access DataHub files and folders via CDMI and REST API using the command-line interface.

For more information please check the Onedata CDMI documentation and the Onedata Oneprovider REST API

Common configuration

Follow instructions above to get an API access token, and configure environment variables:

export DATAHUB_TOKEN=<DATAHUB_ACCESS_TOKEN>
export ONEPROVIDER_HOST=plg-cyfronet-01.datahub.egi.eu

Having jq installed is useful for better formatting of the json output.

CDMI

Configure a header to be passed in some operations.

export CDMI_VSN_HEADER='X-CDMI-Specification-Version: 1.1.1'

See examples on how to list a folder, and file download/upload using CDMI:

# List files in a folder
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -H "$CDMI_VSN_HEADER" \
  "https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/?children" | jq .

# Download "helloworld.txt" from DataHub to "downloadtest.txt" on your computer
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  "https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/helloworld.txt" \
  -o downloadtest.txt

# Upload "helloworld.txt" from your computer to "uploadtest.txt" on DataHub
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -H "$CDMI_VSN_HEADER" \
  -X PUT "https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/uploadtest.txt" \
  -T helloworld.txt

REST API

See examples on how to list a folder, and file download/upload using REST API:

# Get base folder ID
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -X POST "https://$ONEPROVIDER_HOST/api/v3/oneprovider/lookup-file-id/PLAYGROUND"

# Add the folder ID to an environment variable
export DIR_ID=<ID_FROM_PREVIOUS_COMMAND>

# List files inside the folder with DIR_ID
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -X GET "https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$DIR_ID/children" \
  | jq .

# Add the ID of the file that you want to download
export FILE_ID=<ID_FROM_PREVIOUS_COMMAND>

# Download file with FILE_ID from DataHub to "helloworld.txt" on your computer
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -X GET "https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$FILE_ID/content" \
  -o helloworld.txt

# Upload "helloworld.txt" on your local computer to "uploadtest.txt" on DataHub
curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
  -X POST \
  "https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$DIR_ID/children?name=uploadtest.txt" \
  -H "Content-Type: application/octet-stream" -d "@helloworld.txt"

Data access from Python

If your application is written in Python please check the documentation for the OnedataFS Python library

Testing the API with the REST client

A docker container with clients acting as wrappers around the API calls is available: onedata/rest-cli. It's very convenient for discovering and testing the Onezone and Oneprovider API.

docker run -it onedata/rest-cli
# Exporting env for Onezone API
export ONEZONE_HOST=https://datahub.egi.eu
export ONEZONE_API_KEY=<ACCESS_TOKEN>
# Checking current user
onezone-rest-cli getCurrentUSer | jq '.'
# Listing all accessible spaces
onezone-rest-cli listEffectiveUserSpaces | jq '.'
docker run -it onedata/rest-cli
# Exporting env for Oneprovider API
export ONEPROVIDER_HOST=https://plg-cyfronet-01.datahub.egi.eu
export ONEPROVIDER_API_KEY=<ACCESS_TOKEN>
# Listing all spaces supported by the Oneprovider
oneprovider-rest-cli getAllSpaces | jq '.'
# Listing content of a space
oneprovider-rest-cli listFiles path='EGI Foundation/'
oneprovider-rest-cli listFiles path='EGI Foundation/CS3_dataset'

Printing the raw REST calls of a wrapped command

Raw REST calls (used with curl) can be printed using the --dry-run switch.

docker run -it onedata/rest-cli
export ONEZONE_HOST=https://datahub.egi.eu
export ONEZONE_API_KEY=<ACCESS_TOKEN>
# Listing all accessible spaces
onezone-rest-cli listEffectiveUserSpaces | jq '.'
# Printing the curl command without running it
onezone-rest-cli listEffectiveUserSpaces --dry-run

Working with PID / Handle

It’s possible to mint a Permanent Identifier (PID) for a space or a subdirectory of a space using a handle service (like Handle.net) that is registered in the Onezone (EGI DataHub).

Once done, accessing the PID using its URL will redirect to the Onedata share allowing to retrieve the files.

Prerequisites: access to a Handle service registered in the Onezone. See the Handle Service API documentation for documentation on registering a new Handle service or ask a Onezone administrator to authorize you to use an existing Handle service already registered in the Onezone.

The following variables should be set:

  • API_ACCESS_TOKEN: Onedata API access token
  • ONEZONE_HOST: name or IP of the Onezone host (to use Onezone API).
  • ONEPROVIDER_HOST: name or IP of the Oneprovider host (to use Oneprovider API)
# Getting the IDs of the available Handle Services
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/user/handle_services"
HANDLE_SERVICE=<HANDLE_SERVICE_ID>

# Getting details about a specific Handle service
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/user/handle_services/$HANDLE_SERVICE"

# Listing all spaces
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/user/effective_spaces/" | jq '.'

# Displaying details of a space
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/spaces/$SPACE_ID" | jq '.'

# Listing content of a space
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEPROVIDER_HOST/api/v3/oneprovider/files/EGI%20Foundation/" | jq '.'

# Creating a share of a subdirectory of a space
DIR_ID_TO_SHARE=<DIR_ID>
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  -X POST -H 'Content-Type: application/json' \
  -d '{"name": "input"}'
  "$ONEPROVIDER_HOST/api/v3/oneprovider/shares-id/$DIR_ID_TO_SHARE" | jq '.'

# Displaying the share
SHARE_ID=<SHARED_ID>
curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
   "$ONEZONE_HOST/api/v3/onezone/shares/$SHARE_ID" | jq '.'

# Registering a handle
# Proper Dublin Core metadata is required
# It can be created using https://nsteffel.github.io/dublin_core_generator/generator_nq.html
cat metadata.xml
# Escape double quotes and drop line return
METADATA=$(cat metadata.xml | sed 's/"/\\"/g' | tr '\n' ' ')
# On handle creation the created handles is provided in the Location header
curl -D - --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  -H "Content-type: application/json" -X POST \
  -d '{"handleServiceId": "'"$HANDLE_SERVICE_ID"'", "resourceType": "Share", "resourceId": "'"$SHARE_ID"'", "metadata": "'"$METADATA"'"}' \
  "$ONEZONE_HOST/api/v3/onezone/user/handles"

# Listing handles
curl --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/user/handles"

# Displaying a handle
HANDLE_ID=<HANDLE_ID>
curl --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
  "$ONEZONE_HOST/api/v3/onezone/user/handles/$HANDLE_ID"