DataHub API
Most operations in EGI DataHub can be performed using one of the OneData Application Programming Interfaces (APIs).
Important
In order to be able to access the Onedata APIs, an access token is required. See below for instructions on how to generate one.Getting an API access token
Tokens have to be generated from the EGI DataHub (Onezone) interface as documented in Generating tokens for using Oneclient or APIs or using a command-line call as documented hereafter.
Bear in mind that a single API token can be used with both Onezone, Oneprovider and other Onedata APIs.
It’s possible to retrieve the CLIENT_ID
and REFRESH_TOKEN
using the
EGI Check-in Token Portal. See
Check-in documentation for more information.
$ CLIENT_ID=<CLIENT_ID>
$ REFRESH_TOKEN=<REFRESH_TOKEN>
# Retrieving an OIDC token from Check-in
$ curl -X POST \
-d "client_id=$CLIENT_ID&grant_type=refresh_token&refresh_token=$REFRESH_TOKEN&scope=openid%20email%20profile%20eduperson_entitlement" \
'https://aai.egi.eu/auth/realms/egi/protocol/openid-connect/token' | python -m json.tool;
# Token is in the access_token field of the response
The following variables should be set:
OIDC_TOKEN
: OpenID Connect Access token.ONEZONE_HOST
: name or IP of the Onezone host (to use Onezone API).
$ ONEZONE_HOST=https://datahub.egi.eu
$ OIDC_TOKEN=<OIDC_ACCESS_TOKEN>
$ curl -H "X-Auth-Token: egi:$OIDC_TOKEN" -X POST \
-H 'Content-type: application/json' \
"$ONEZONE_HOST/api/v3/onezone/user/tokens/named" \
-d '{
"name": "REST and CDMI access token",
"type": {
"accessToken": {}
},
"caveats": [
{
"type": "interface",
"interface": "rest"
}
]
}'
Data access via CDMI and REST API
Below are example commands to learn how to access DataHub files and folders via CDMI and REST API using the command-line interface.
For more information please check the Onedata CDMI documentation and the Onedata Oneprovider REST API
Common configuration
Follow instructions above to get an API access token, and configure environment variables:
$ export DATAHUB_TOKEN=<DATAHUB_ACCESS_TOKEN>
$ export ONEPROVIDER_HOST=plg-cyfronet-01.datahub.egi.eu
Having jq installed is useful for better formatting of the JSON output.
CDMI
Configure a header to be passed in some operations.
$ export CDMI_VSN_HEADER='X-CDMI-Specification-Version: 1.1.1'
See examples on how to list a folder, and file download/upload using CDMI:
# List files in a folder
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-H "$CDMI_VSN_HEADER" \
"https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/?children" | jq .
# Download "helloworld.txt" from DataHub to "downloadtest.txt" on your computer
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
"https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/helloworld.txt" \
-o downloadtest.txt
# Upload "helloworld.txt" from your computer to "uploadtest.txt" on DataHub
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-H "$CDMI_VSN_HEADER" \
-X PUT "https://$ONEPROVIDER_HOST/cdmi/PLAYGROUND/uploadtest.txt" \
-T helloworld.txt
REST API
See examples on how to list a folder, and file download/upload using REST API:
# Get base folder ID
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-X POST "https://$ONEPROVIDER_HOST/api/v3/oneprovider/lookup-file-id/PLAYGROUND"
# Add the folder ID to an environment variable
$ export DIR_ID=<ID_FROM_PREVIOUS_COMMAND>
# List files inside the folder with DIR_ID
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-X GET "https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$DIR_ID/children" \
| jq .
# Add the ID of the file that you want to download
$ export FILE_ID=<ID_FROM_PREVIOUS_COMMAND>
# Download file with FILE_ID from DataHub to "helloworld.txt" on your computer
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-X GET "https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$FILE_ID/content" \
-o helloworld.txt
# Upload "helloworld.txt" on your local computer to "uploadtest.txt" on DataHub
$ curl -H "X-Auth-Token: $DATAHUB_TOKEN" \
-X POST \
"https://$ONEPROVIDER_HOST/api/v3/oneprovider/data/$DIR_ID/children?name=uploadtest.txt" \
-H "Content-Type: application/octet-stream" -d "@helloworld.txt"
Data access from Python
If your application is written in Python please check the documentation for the OnedataFS Python library
Testing the API with the REST client
A docker container with clients acting as wrappers around the API calls is
available: onedata/rest-cli
. It's very convenient for discovering and testing
the Onezone and Oneprovider API.
$ docker run -it onedata/rest-cli
# Exporting env for Onezone API
$ export ONEZONE_HOST=https://datahub.egi.eu
$ export ONEZONE_API_KEY=<ACCESS_TOKEN>
# Checking current user
$ onezone-rest-cli getCurrentUSer | jq '.'
# Listing all accessible spaces
$ onezone-rest-cli listEffectiveUserSpaces | jq '.'
$ docker run -it onedata/rest-cli
# Exporting env for Oneprovider API
$ export ONEPROVIDER_HOST=https://plg-cyfronet-01.datahub.egi.eu
$ export ONEPROVIDER_API_KEY=<ACCESS_TOKEN>
# Listing all spaces supported by the Oneprovider
$ oneprovider-rest-cli getAllSpaces | jq '.'
# Listing content of a space
$ oneprovider-rest-cli listFiles path='EGI Foundation/'
$ oneprovider-rest-cli listFiles path='EGI Foundation/CS3_dataset'
Printing the raw REST calls of a wrapped command
Raw REST calls (used with curl
) can be printed using the --dry-run
switch.
$ docker run -it onedata/rest-cli
$ export ONEZONE_HOST=https://datahub.egi.eu
$ export ONEZONE_API_KEY=<ACCESS_TOKEN>
# Listing all accessible spaces
$ onezone-rest-cli listEffectiveUserSpaces | jq '.'
# Printing the curl command without running it
$ onezone-rest-cli listEffectiveUserSpaces --dry-run
Working with PID / Handle
It’s possible to mint a Permanent Identifier (PID) for a space or a subdirectory of a space using a handle service (like Handle.net) that is registered in the Onezone (EGI DataHub).
Once done, accessing the PID using its URL will redirect to the Onedata share allowing to retrieve the files.
Prerequisites: access to a Handle service registered in the Onezone. See the Handle Service API documentation for documentation on registering a new Handle service or ask a Onezone administrator to authorize you to use an existing Handle service already registered in the Onezone.
The following variables should be set:
API_ACCESS_TOKEN
: Onedata API access tokenONEZONE_HOST
: name or IP of the Onezone host (to use Onezone API).ONEPROVIDER_HOST
: name or IP of the Oneprovider host (to use Oneprovider API)
# Getting the IDs of the available Handle Services
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/user/handle_services"
HANDLE_SERVICE=<HANDLE_SERVICE_ID>
# Getting details about a specific Handle service
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/user/handle_services/$HANDLE_SERVICE"
# Listing all spaces
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/user/effective_spaces/" | jq '.'
# Displaying details of a space
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/spaces/$SPACE_ID" | jq '.'
# Listing content of a space
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEPROVIDER_HOST/api/v3/oneprovider/files/EGI%20Foundation/" | jq '.'
# Creating a share of a subdirectory of a space
$ DIR_ID_TO_SHARE=<DIR_ID>
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
-X POST -H 'Content-Type: application/json' \
-d '{"name": "input"}'
"$ONEPROVIDER_HOST/api/v3/oneprovider/shares-id/$DIR_ID_TO_SHARE" | jq '.'
# Displaying the share
$ SHARE_ID=<SHARED_ID>
$ curl -sS --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/shares/$SHARE_ID" | jq '.'
# Registering a handle
# Proper Dublin Core metadata is required
# It can be created using https://nsteffel.github.io/dublin_core_generator/generator_nq.html
$ cat metadata.xml
# Escape double quotes and drop line return
$ METADATA=$(cat metadata.xml | sed 's/"/\\"/g' | tr '\n' ' ')
# On handle creation the created handles is provided in the Location header
$ curl -D - --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
-H "Content-type: application/json" -X POST \
-d '{"handleServiceId": "'"$HANDLE_SERVICE_ID"'", "resourceType": "Share", "resourceId": "'"$SHARE_ID"'", "metadata": "'"$METADATA"'"}' \
"$ONEZONE_HOST/api/v3/onezone/user/handles"
# Listing handles
$ curl --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/user/handles"
# Displaying a handle
$ HANDLE_ID=<HANDLE_ID>
$ curl --tlsv1.2 -H "X-Auth-Token: $API_ACCESS_TOKEN" \
"$ONEZONE_HOST/api/v3/onezone/user/handles/$HANDLE_ID"
Subscribe to file events
Following is an example of how to subscribe to DataHub to receive notification on file events which is described in details in the official documentation Subscribe to file events:
$ curl -N -H "X-Auth-Token: $TOKEN" \
-X POST "https://$ONEPROVIDER_HOST/api/v3/oneprovider/changes/metadata/$SPACE_ID" \
-H "Content-Type: application/json" -d "@./changes_req.json"
This requires the permission set as following:
For groups or single users. For single users, one way to add one is to select Effective members" -> in the user list search for the required user and “Make an owner”. In this case the user will have admins privileges in addition to the one required. As this might not be the desired configuration it will be enough to remove all the unwanted permissions, e.g.: make it the same as the VO to which the user belongs to, and leave only, as extra, the permission shown in the screenshot.