The following guide is intended for researchers who want to use ECAS, a complete environment enabling data analysis experiments, in the EGI cloud.
ECAS (ENES Climate Analytics Service) is part of the EOSC-hub service catalog and aims to:
It relies on Ophidia, a data analytics framework for eScience, which provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes (“datacubes”), and on JupyterHub, to give users access to ready-to-use computational environments and resources.
Thanks to the Elastic Cloud Compute Cluster (EC3) platform, operated by the Polytechnic University of Valencia (UPV), researchers will be able to rely on the EGI Cloud Compute service to scale up to larger simulations without being worried about the complexity of the underlying infrastructure.
This guide will show how to:
In the latest release of the EC3 platform, tailored to support the EGI Applications on Demand (AoD) service, a new Ansible receipt is now available for researchers interested to deploy ECAS cluster on the EGI Infrastuctrure. Additional details on how to configure and deploy an ECAS cluster on EGI resources are provided in the next sections.
ECAS in now available in the latest release of the EC3 platform supporting the EGI Applications on Demand (AoD). The next sections provide details on how to configure and deploy an ECAS cluster on EGI resources.
To configure and deploy a Virtual Elastic Cluster using EC3, access the EC3 platform front page and click on the "Deploy your cluster" link as shown in the figure below:
A wizard will guide you through the cluster configuration process. Specifically, the general wizard steps include:
vo.access.egi.eu
VO are dynamically
retrieved from the EGI Application DataBase using
REST APIs.When the front-end node of the cluster has been successfully deployed, you will be notified with the credentials to access via SSH.
The cluster details are available by clicking on the "Manage your deployed clusters" link on the front page:
To access the front-end of the cluster:
600
;[fabrizio@MBP EC3]$ ssh -i key.pem cloudadm@134.158.151.218
Last login: Mon Nov 18 11:37:29 2019 from torito.i3m.upv.es
[cloudadm@oph-server ~]$ sudo su -
[root@oph-server ~]#
Both the front-end and the working node are configured by Ansible. This process
usually takes some time. You can monitor the status of the cluster configuration
using the is_cluster_ready
command-line tool:
[root@oph-server ~]# is_cluster_ready
Cluster is still configuring.
The cluster is successfully configured when the command returns the following message:
[root@oph-server ~]# is_cluster_ready
Cluster configured!
As SLURM is used as workload manager, it is possible to check the status of the working nodes by using the sinfo command, which provides information about Slurm nodes and partitions.
[root@oph-server ~]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 down* oph-io2
debug* up infinite 1 idle oph-io1
ECAS provides two different ways to get access to its scientific eco-system:
Ophidia client (oph_term
) and JupyterHub.
Run the Ophidia terminal as ophuser
user.
The default parameters are already defined as environmental variables inside the
.bashrc
file:
export OPH_SERVER_HOST="127.0.0.1"
export OPH_SERVER_PORT="11732"
export OPH_PASSWD="abcd"
export OPH_USER="oph-test"
Create an empty container and a new datacube with random data and dimensions.
Now, you can submit your first operation of data transformation: let’s reduce the whole datacube in a single value for grid point using the average along the time:
Let’s have a look at the environment by listing the datacubes and containers in the session:
By default, the Ophidia terminal will use the last output datacube PID. So, you
can use the oph_explorecube
operator to visualize the first 100 values.
For further details about the Ophidia operators, please refer to the official documentation.
To access the Jupyter interface, open the browser at
https://<YOUR_CLUSTER_IP>:443/jupyter
and log in to the system using the
username and password specified in the jupyterhub_config.pyp
configuration
file (see the c.Authenticator.whitelist
and c.DummyAuthenticator.password
lines) located at the /root
folder.
From JupyterHub in ECAS you can do several things such as:
To get started with the ECAS environment capabilities, open the
ECAS_Basics.ipynb
notebook available under the notebooks/
folder in the home
directory.