Administering Rucio
Within Rucio there are several levels of administrators. There are the super admins, which are the staff that runs Multi-VO Rucio. Then there are virtual organisation (VO) specific admins that will look after the day-to-day operations of thier VO. Below are some of the tasks that VO admins will need to do to set up and maintain their VO.
Creating Accounts, Identities, and Quotas
To add new users within your VO, you will need to communicate with Rucio as the VO admin. Then using the rucio-admin commands, create a new account and add identities to the account. The account is a username with no permissions, or authentication methods. The identities bind authentication methods and permissions to the account. The account you want to create identities for is input as an argument. Accounts will have different permissions and access (such as how much data they can store on a particular RSE).
CLI Example
$ rucio-admin account add \
--type USER \
--email jdoe@email.com jdoe
Added new account: jdoe
$ rucio-admin identity add \
--account jdoe \
--type USER \
--id userjdoe \
--email jdoe@email.com \
--password jdoepass
Added new identity to account: userjdoe-jdoe
$ rucio-admin account set-limits jdoe storagesite1 100GB
Set account limit for account jdoe on RSE storagesite1: 100.000 GB
Python Client Example
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.add_account('jdoe', 'USER', 'jdoe@email.com')
True
>>> CLIENT.add_identity('jdoe', 'USER', 'jdoe@email.com')
True
>>> CLIENT.set_account_limit('jdoe', 'storagesite1', 107374182400, 'global')
True
Creating RSE(s)
Rucio Storage Elements (RSEs) are how Rucio represents the physical storage available to your VO. As with many aspects of Rucio there are a lot of optional attributes that can be set for an RSE, but as a minimum a protocol for transfers need to be added before it can be used.
Creating RSE(s) CLI Example
$ rucio-admin rse add NEW_RSE
Added new deterministic RSE: NEW_RSE
$ rucio-admin rse add-protocol \
--hostname test.org \
--scheme gsiftp \
--prefix '/filepath/rucio/' \
--port 8443 NEW_RSE \
--domain-json '{
"wan": {
"read": 1,
"write": 1,
"third_party_copy": 0,
"delete": 1
},
"lan": {
"read": 1,
"write": 1,
"third_party_copy": 0,
"delete": 1
}
}'
Creating RSE(s) Python Client Example
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.add_rse('NEW_RSE')
True
>>> CLIENT.add_protocol(
'NEW_RSE', {
'hostname': 'test.org',
'scheme': 'gsiftp',
'prefix': '/filepath/rucio/',
'port': 8443,
'impl': 'rucio.rse.protocols.gfalv2.Default',
'domain': {
"wan": {
"read": 1,
"write": 1,
"third_party_copy": 0,
"delete": 1
},
"lan": {
"read": 1,
"write": 1,
"third_party_copy": 0,
"delete": 1
}
}
}
)
True
Updating RSE Protocols
On occasion, it may be necessary to change or update an RSE protocol. Unlike
settings (rucio-admin rse update
) or attributes
(rucio-admin rse set-attribute
), there isn’t a direct CLI function for
changing a protocol. It would therefore be necessary to remove
(rucio-admin rse delete-protocol
) and then add
(rucio-admin rse add-protocol
) it again using different information.
Alternatively, the Python client has additional functionality to directly update
or swap the priority of RSE protocols. For example to update the impl
without
changing anything else (the data
argument is used to update the protocol, with
the other settings used to specify the protocol to change):
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.update_protocols(
rse='NEW_RSE',
scheme='gsiftp',
data={
'impl': rucio.rse.protocols.gfal.Default'
},
hostname='test.org',
port=8433
)
True
To swap the priority of two protocols for the third party copy operation:
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.swap_protocols(
rse='NEW_RSE',
domain='wan',
operation='third_party_copy',
scheme_a='gsiftp',
scheme_b='root'
)
True
It’s also worth noting that when an RSE is deleted using
rucio-admin rse delete
, the entry remains in the database. This “soft”
deletion means that attempting to add a new RSE with the same name as a deleted
RSE will fail. This is due to the RSE not having a unique name/VO combination.
In practice it is therefore better to update a badly configured RSE rather than
attempting to delete and re-add it. However, if the latter method is preferred,
it is possible manually rename the deleted RSE in the database (as there are no
foreign key constraints on its name, just the ID and VO) so that the old name
can be re-used.
Basic Usage
This section covers some of the basic Rucio functions that can be run once the VO has accounts and RSEs set up. As with the setup, there are many options that won’t be covered here. For more information refer to either the main documentation or the help for the function in question.
Daemons
Most operations in Rucio (such as transfers, deletions, rule evaluation) require one or more of the daemons to be running in order to take effect. For a multi-VO instance, these should be running for all VOs already. However, on new VO’s joining Rucio some updating of the daemons will be neccessary.
If it seems like it is not quite right please contact the Rucio team through GGUS.
Uploading Data
In Rucio files and their replicas are represented by Data IDentifiers (DIDs), which are composed of a scope and name. Furthermore, multiple files can be attached to a dataset, which in turn can be attached to a container (which can be attached to another container and so on). Datasets and containers are also represented by DIDs.
Scopes are always associated with a particular Rucio account, and must be added
to Rucio using an admin account. If no scope is provided when uploading, Rucio
will default to user.<account>
, but this still needs to have been added by an
admin.
Once a file has been uploaded via the CLI or Python client, it can then be attached to a dataset. It’s worth noting that by default, some Rucio commands will not list files, only datasets.
Uploading Data CLI Example
Assuming the file test.txt
exists locally:
$ rucio-admin scope add --account root --scope user.root
Added new scope to account: user.root-root
$ rucio upload --rse NEW_RSE test.txt
2020-08-14 15:28:15,059 INFO Preparing upload for file test.txt
2020-08-14 15:28:15,235 INFO Successfully added replica in Rucio catalogue at NEW_RSE
2020-08-14 15:28:15,334 INFO Successfully added replication rule at NEW_RSE
2020-08-14 15:28:15,579 INFO Trying upload with mock to NEW_RSE
2020-08-14 15:28:15,579 295 INFO Trying upload with mock to NEW_RSE
2020-08-14 15:28:15,580 INFO Successful upload of temporary file. mock://test.org:123/filepath/rucio/user/root/46/6b/test.txt.rucio.upload
2020-08-14 15:28:15,580 295 INFO Successful upload of temporary file. mock://test.org:123/filepath/rucio/user/root/46/6b/test.txt.rucio.upload
2020-08-14 15:28:15,580 INFO Successfully uploaded file test.txt
2020-08-14 15:28:15,580 295 INFO Successfully uploaded file test.txt
2020-08-14 15:28:15,583 295 DEBUG Starting new HTTPS connection (1): rucio:443
2020-08-14 15:28:15,598 295 DEBUG https://rucio:443 "POST /traces/ HTTP/1.1" 201 7
2020-08-14 15:28:15,662 295 DEBUG https://rucio:443 "PUT /replicas HTTP/1.1" 200 0
$ rucio list-dids user.root:test.txt --filter type=ALL
+--------------------+--------------+
| SCOPE:NAME | [DID TYPE] |
|--------------------+--------------|
| user.root:test.txt | FILE |
+--------------------+--------------+
$ rucio add-dataset user.root:test_dataset
Added user.root:test_dataset
$ rucio attach user.root:test_dataset user.root:test.txt
DIDs successfully attached to user.root:test_dataset
$ rucio list-content user.root:test_dataset
+--------------------+--------------+
| SCOPE:NAME | [DID TYPE] |
|--------------------+--------------|
| user.root:test.txt | FILE |
+--------------------+--------------+
Uploading Data Python Client Example
Assuming the file test.txt
exists locally:
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.add_scope('root', 'user.root')
True
>>> from rucio.client.uploadclient import UploadClient
>>> UPLOAD_CLIENT = UploadClient()
>>> UPLOAD_CLIENT.upload([{'path': 'test.txt', 'rse': 'NEW_RSE'}])
2020-08-14 14:47:31,147 8431 DEBUG Starting new HTTPS connection (1): rucio:443
2020-08-14 14:47:31,166 8431 DEBUG https://rucio:443 "POST /traces/ HTTP/1.1" 201 7
2020-08-14 14:47:31,224 8431 DEBUG https://rucio:443 "PUT /replicas HTTP/1.1" 200 None
0
>>> list(CLIENT.list_dids('user.root', {}, type='all'))
[u'test.txt']
>>> CLIENT.add_dataset('user.root', 'test_dataset')
True
>>> CLIENT.attach_dids('user.root', 'test_dataset', [{'scope': 'user.root', 'name': 'test.txt'}])
>>> list(CLIENT.list_content('user.root', 'test_dataset'))
[{u'adler32': u'00000001', u'name': u'test.txt', u'bytes': 0, u'scope': u'user.root', u'type': u'FILE', u'md5': u'd41d8cd98f00b204e9800998ecf8427e'}]
Adding Replication Rules
Once a DID exists within the Rucio catalogue, replicas of that file, dataset or collection are created and maintained by Replication rules. By uploading a file to a particular RSE, a replication rule is created for that file, however rules can also be added for existing DIDs. As a minimum an RSE and number of copies must be specified, but further options such as lifetime of the rule and selecting RSEs based on user set attributes are also possible.
Adding Replication Rules CLI Example
$ rucio list-rules --account root
ID ACCOUNT SCOPE:NAME STATE[OK/REPL/STUCK] RSE_EXPRESSION COPIES EXPIRES (UTC) CREATED (UTC)
-------------------------------- --------- ------------------ ---------------------- ---------------- -------- --------------- -------------------
991f9ace7ed74cad989efde90b6a23c5 root user.root:test.txt OK[1/0/0] NEW_RSE 1 2020-08-14 15:28:15
$ rucio add-rule user.root:test_dataset 1 NEW_RSE
bd51b767ef524878bb3cc68db16d2374
$ rucio list-rules --account root
ID ACCOUNT SCOPE:NAME STATE[OK/REPL/STUCK] RSE_EXPRESSION COPIES EXPIRES (UTC) CREATED (UTC)
-------------------------------- --------- ---------------------- ---------------------- ---------------- -------- --------------- -------------------
991f9ace7ed74cad989efde90b6a23c5 root user.root:test.txt OK[1/0/0] NEW_RSE 1 2020-08-14 15:28:15
bd51b767ef524878bb3cc68db16d2374 root user.root:test_dataset OK[1/0/0] NEW_RSE 1 2020-08-14 15:47:15
Adding Replication Rules Python Client Example
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> list(CLIENT.list_account_rules('root'))
[{u'locks_ok_cnt': 1, u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'meta': None,
u'child_rule_id': None, u'id': u'991f9ace7ed74cad989efde90b6a23c5', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None,
u'split_container': False, u'priority': 3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'FILE',
u'account': u'root', u'locked': False, u'name': u'test.txt', u'created_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}]
>>> CLIENT.add_replication_rule([{'scope': 'user.root', 'name': 'test_dataset'}], 1, 'NEW_RSE')
[u'76b262b45dca4e769221224e1ccf5c7a']
>>> list(CLIENT.list_account_rules('root'))
[{u'locks_ok_cnt': 1, u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'meta': None,
u'child_rule_id': None, u'id': u'991f9ace7ed74cad989efde90b6a23c5', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None,
u'split_container': False, u'priority': 3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'FILE',
u'account': u'root', u'locked': False, u'name': u'test.txt', u'created_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}, {u'locks_ok_cnt': 1,
u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 47, 15), u'meta': None, u'child_rule_id': None, u'id':
u'bd51b767ef524878bb3cc68db16d2374', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None, u'split_container': False, u'priority':
3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'DATASET', u'account': u'root', u'locked': False,
u'name': u'test_dataset', u'created_at': datetime.datetime(2020, 8, 14, 15, 47, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}]
Multi-VO Features
From a users perspective, whether the instance is multi or single VO should not change any functionality. Furthermore, depending on the client setup, VO does not need to be provided. There are however, some occasions when an optional argument for the VO can be given in a multi-VO instance.
Swapping VOs
Just like how an identity can be associated with (and used to authenticate
against) multiple accounts, the same identity can be used for accounts at more
than one VO. Account and identity can be retrieved from the config file if
present, and the VO set there will be used (unless the environment variable
RUCIO_VO
is also set, in which case the latter takes precedent). Both will be
ignored however if the VO is passed as an optional argument in the CLI or Python
client. Using this optional argument allows a user to quickly run commands on a
different VO they have access to.
Swapping VOs CLI Example
$ rucio whoami
status : ACTIVE
account : jdoes_abc_account
account_type : USER
created_at : 2020-08-07T08:27:29
updated_at : 2020-08-07T08:27:29
suspended_at : None
deleted_at : None
email : N/A
$ rucio --vo xyz whoami
status : ACTIVE
account : jdoes_xyz_account
account_type : USER
created_at : 2020-08-11T12:13:58
updated_at : 2020-08-11T12:13:58
suspended_at : None
deleted_at : None
email : N/A
Swapping VOs Python Client Example
>>> from rucio.client import Client
>>> CLIENT = Client()
>>> CLIENT.whoami()
{u'status': u'ACTIVE', u'account': u'jdoes_abc_account', u'account_type': u'USER', u'created_at': u'2020-08-07T08:27:29', u'updated_at': u'2020-08-07T08:27:29', u'suspended_at': None, u'deleted_at': None, u'email': u'N/A'}
>>> CLIENT_XYX = Client(vo='xyz')
>>> CLIENT_XYZ.whoami()
{u'status': u'ACTIVE', u'account': u'jdoes_xyz_account', u'account_type': u'USER', u'created_at': u'2020-08-11T12:13:58', u'updated_at': u'2020-08-11T12:13:58', u'suspended_at': None, u'deleted_at': None, u'email': u'N/A'}