Workload Manager

The EGI Workload Manager

What is the EGI Workload Manager?

The EGI Workload Manager is a service provided to the EGI community to efficiently manage and distribute computing workloads on the EGI infrastructure. The service, based on the DIRAC technology, is configured to support a number of HTC and cloud resource pools from the EGI Federation. This pool of computing resources can be easily extended and customized to support the needs of new scientific communities. In the LHCb experiment the service has proven production scalability up to peaks of more than 100.000 concurrently running jobs. WeNMR, the structural biology community, uses the service for a number of community services. The community reported an improvement of jobs submission in the infrastructure from previous 70% to 99% with the EGI Workload Manager service. The delivery of the service is coordinated by the EGI Foundation and operated by IN2P3 on resources provided by CYFRONET.

Main Features

The EGI Workload Manager:

  • Maximizes usage efficiency by choosing appropriately computing and storage resources on real-time.

  • Provides a large-scale distributed environment to manage and handle data storage, data movement, accessing and processing.

  • Handles job submission and workload distribution in a transparent way.

  • Improves the general job throughput compared with native management of EGI grid or cloud computing resources.

  • Offers pilot-based task scheduling method, that submits pilot jobs to resources to check the execution environment before to start the user’s jobs. From a technical standpoint, the user’s job description is delivered to the pilot, which prepares its execution environment and executes the user application. The pilot-based scheduling feature solves many problems of using heterogeneity and unstable distributed computing resources.

  • Includes easy extensions to customize the environment checks to address the needs of a particular community. Users can choose appropriately computing and storage resources maximising their usage efficiency for particular user requirements.

  • Handles different storage supporting both cloud and grid capacity.

  • Provides a user-friendly interface that allows users to choose among different DIRAC services.

Target User Groups

The service suits for the established Virtual Organization communities, long tail of users, SMEs and Industry

  • EGI and EGI Federation participants
  • Research communities

Architecture

The EGI Workload Manager service is a cluster of DIRAC services running on EGI resources (HTC, CLOUD, HPC) supporting multi-VO. All the DIRAC services are at or above TRL8. The main service components include:

  • Workload Management System (WMS) architecture is composed of multiple loosely coupled components working together in a collaborative manner with the help of a common Configuration Services ensuring reliable service discovery functionality. The modular architecture allows to easily incorporate new types of computing resources as well as new task scheduling algorithms in response to evolving user requirements. DIRAC services can run on multiple geographically distributed servers which increases the overall reliability and excellent scalability properties.

  • REST server providing language neutral interface to DIRAC service.

  • Web portal provides simple and intuitive access to most of the DIRAC functionalities including management of computing tasks and distributed data. It also has a modular architecture designed specifically to allow easy extension for the needs of particular applications.

The DIRAC Web portal

The DIRAC Web portal

How to access the EGI Workload Manager service

There are several options to access the service:

  1. Members of a scientific community whose resources pool is already configured in the EGI Workload Manager instance > can use the EGI Workload Manager web portal to access the service, or use DIRAC Client.
  2. Individual researchers who want to do some number crunching for a limited period of time, with a reasonable (not too high) number of CPUs > can use the catch-all VO resource pool (vo.access.egi.eu). Submit a request through the EGI Marketplace selecting:
    Compute > Workload Manager from the top menu
  3. Representatives of a community who want to try DIRAC and EGI > Same as #2.
  4. Representative of a community who wants to request DIRAC for the community’s own resource pool > Submit a request through the EGI Marketplace selecting
    Compute > Workload Manager from the top menu

Getting Started

Submit a service order via the Marketplace

User can request access to the service submitting a service-order to use the EGI HTC service directly from the EOSC or the EGI Marketplace:

EOSC Marketplace: https://marketplace.eosc-portal.eu/services/egi-workload-manager

EGI Marketplace: https://marketplace.egi.eu/compute/73-workload-manager.html

Service orders are usually handled within 5 working days by the EGI User Support Team on shift.

Before starting

Apply for your user credentials

DIRAC uses X.509 certificates to identify and authenticate users. These certificates are delivered to each individual by trusted certification authorities. If you have a personal certificate issued by a EUGridPMA-certified authority you can use it for this tutorial. Otherwise refer to the information available in this section, to obtain a certificate. Your certificate may take a few days to be delivered, so please ask for your certificate well in advance and in any case, before the tutorial starts.

Install your credentials

Your personal certificate is usually delivered to you via a web site and is automatically loaded in your browser. You need to export it from the browser and put it in the appropriate format for DIRAC to use. This is a one-time operation. Please follow the instructions in detailed in VOMS documentation page to export and in install your certificate.

Send your certificate’s subject to the DIRAC team

In order to configure the DIRAC server so that you gets registered as a user, the team needs to know your certificate’s subject.

Please use the command below on any Unix machine and send its output to
dirac-support <AT> mailman.egi.eu

openssl x509 -in $HOME/.globus/usercert.pem -subject -noout

The EGI Workload Manager Web Portal

To access the EGI Workload Manager open a web browser to: https://dirac.egi.eu/DIRAC/

The EGI Workload Manager service Web portal

The EGI Workload Manager service Web portal

  • If you are a new user, you can see the welcome page where you can find links to user documentations.

  • VO options: you can switch to different VOs that you have membership.

  • Log In options: the service supports both X.509, Certificate and Check-in log-in.

  • View options: allow to choose either desktop or tabs layout.

  • Menu: a list of tools that enable the selected VO.

Upload Proxy

Before submitting your job, you need to upload your Proxy. Login to the portal. Go to:
Menu > Tools > Proxy Upload, enter your certificates .p12 file and the passphrase, click Upload.

The wizard to upload the .p12 proxy certificate The wizard to upload the .p12 proxy certificate

The wizard to upload the .p12 proxy certificate

Job Submission

Go to:
Menu > Tools > Job Launchpad. First check the Proxy Status, click it until it shows Valid in green color.

In the Job Launchpad, you can select your jobs from the list; add parameters, indicating the output Sandbox location.

Now, select Helloworld from the job list, and click Submit, you just launch your very first job to the EGI HTC cluster.

Submit a job with the Job Launchpad

Submit a job with the Job Launchpad

Monitor Job status

Go to:
Menu > Applications > Job Monitor.
The left panel gives all kinds of search options for your jobs. Set your search criteria, and click Submit, the jobs will list on the right panel.
Try the various options to view different information about the jobs.

Monitor the job execution with the Job Monitor panel

Monitor the job execution with the Job Monitor panel

Get Results from Sandbox

Once the job has been successfully processed, the Status of the job will change to green. Right click the job, select:
Sandbox > Get Output file(s), you can get the result file(s).

Full User Guide for DIRAC Web Portal

For further instructions, please refer to DIRAC Web Portal Guide

The DIRAC client tool

The easiest way to install the client is via Docker Container. If you have a Docker client installed in your machine, install the DIRAC CLI as follows:

docker run -it -v $HOME:$HOME -e HOME=$HOME diracgrid/client:egi

Once the client software is installed, it should be configured in order to access the EGI Workload Manager service:

source /opt/dirac/bashrc

To proceed further a temporary proxy of the user certificate should be created. This is necessary to get information from the central Configuration Service:

$ dirac-proxy-init -x
Generating proxy...
Enter Certificate password:
...

Now the client can be configured to work in conjunction with the EGI Workload Manager service:

$ dirac-configure defaults-egi.cfg Executing:
/home/jdoe/DIRAC/DIRAC/Core/scripts/dirac-configure.py defaults-egi.cfg


Checking DIRAC installation at "/home/jdoe/DIRAC" Created vomsdir file
/home/jdoe/DIRAC/etc/grid-security/vomsdir/vo.formation.idgrilles.fr/cclcgvomsli01.in2p3.fr.lsc
[..]
Created vomsdir file
/home/jdoe/DIRAC/etc/grid-security/vomsdir/fedcloud.egi.eu/voms2.grid.cesnet.cz.lsc
Created vomses file `/home/jdoe/DIRAC/etc/grid-security/vomses/fedcloud.egi.eu`

Generate the proxy containing the credentials of your VO. Specify the VO in the --group option:

In this example, we are going to use the resources allocated for the WeNMR project.

$ dirac-proxy-init --debug --group wenmr_user -U --rfc
$ dirac-proxy-init --debug --group wenmr_user -U --rfc
Generating proxy...
Enter Certificate password:
Contacting CS...
Checking DN /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe
jdoe@egi.eu
Username is jdoe
Creating proxy for jdoe@wenmr_user
(/DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe
jdoe@egi.eu)
Requested adding a VOMS extension but no VOMS attribute defined for group wenmr_user
Uploading proxy for wenmr_user...
Uploading wenmr_user proxy to ProxyManager...
Loading user proxy Uploading proxy on-the-fly
Cert file /home/jdoe/.globus/usercert.pem Key file
/home/jdoe/.globus/userkey.pem
Loading cert and key User credentials loaded
Uploading...  Proxy uploaded Proxy generated:
subject      : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu/CN=0123456789
issuer       : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu
identity     : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu
timeleft     : 23:59:58
DIRAC group  : wenmr_user
rfc          : True
path         : /tmp/x509up_u0
username     : jdoe
properties   : LimitedDelegation, GenericPilot, Pilot, NormalUser

Proxies uploaded: DN
| Group               | Until (GMT) /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting
EGI/CN=Jane Doe jdoe@egi.eu | access.egi.eu_user  | 2021/09/14 23:54
/DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe
jdoe@egi.eu | fedcloud_user       | 2021/09/14 23:54
/DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe
jdoe@egi.eu | access.egi.eu_admin | 2021/09/14 23:54
/DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe
jdoe@egi.eu | wenmr_user          | 2021/09/14 23:54

As a result of this command a user proxy with the same validity period of the certificate is uploaded to the DIRAC ProxyManager service.

For checking the details of you proxy, run the following command:

$ dirac-proxy-info
subject      : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu/CN=0123456789
issuer       : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu
identity     : /DC=org/DC=terena/DC=tcs/C=NL/O=Stichting EGI/CN=Jane Doe jdoe@egi.eu
timeleft     : 23:59:26
DIRAC group  : wenmr_user
rfc          : True
path         : /tmp/x509up_u0
username     : jdoe
properties   : LimitedDelegation, GenericPilot, Pilot, NormalUser

Managing simple jobs

DIRAC commandsNote
dirac-wms-job-statusTo check the status of a job
dirac-wms-job-deleteTo delete a job
dirac-wms-job-logging-infoTo retrieve history of transitions for a DIRAC job
dirac-wms-job-get-outputTo retrieve the job output
dirac-wms-job-submitTo submit a job

DIRAC commands

Have a look at the official command reference documentation for the complete list of the Workload Management commands.

In general, you can submit jobs, check their status, and retrieve the output. For example:

Create a simple JDL file (test.jdl) to submit the job:

[ JobName = "Simple_Job"; Executable = "/bin/ls"; Arguments = "-ltr";
StdOutput = "StdOut"; StdError = "StdErr"; OutputSandbox = {"StdOut","StdErr"};
]

Submit the job:

dirac-wms-job-submit test.jdl JobID = 53755998

Check the job status:

$ dirac-wms-job-status 53755998 JobID=23844073 Status=Waiting;
MinorStatus=Pilot Agent Submission; Site=ANY;
$ dirac-wms-job-status 53755998 JobID=53755998 Status=Done;
MinorStatus=Execution Completed; Site=EGI.NIKHEF.nl; Site=EGI.HG-08-Okeanos.gr;

Retrieve the outputs of the job (when the status is Done):

$ dirac-wms-job-get-output --Dir joboutput/ 53755998 Job output sandbox
retrieved in joboutput/53755998/

Jobs with Input Sandbox and Output Sandbox

In most cases the job input data or executable files are available locally and should be transferred to the grid to run the job. In this case the InputSandbox attribute can be used to move the files together with the job.

Create the InputAndOuputSandbox.jdl

JobName    = "InputAndOuputSandbox";
Executable = "testJob.sh";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"testJob.sh"};
OutputSandbox = {"StdOut","StdErr"};

Create a simple shell script (testJob.sh)

#!/bin/bash
/bin/hostname
/bin/date
/bin/ls -la

After creation of JDL file the next step is to submit the job, using the command:

dirac-wms-job-submit InputAndOuputSandbox.jdl JobID = XXXXXXXX

More details

Technical Support


Last modified November 19, 2020: Missing check-in docs (#121) (bd67e34)