Architecture

Internal Service Architecture

The EGI Notebooks service relies on the following technologies to provide its functionality:

  • JupyterHub with custom EGI Check-in oauthentication configured to spawn pods on Kubernetes.
  • Kubernetes as container orchestration platform running on top of EGI Cloud resources. Within the service it is in charge of managing the allocated resources and providing the right abstraction to deploy the containers that build the service. Resources are provided by EGI Federated Cloud providers, including persistent storage for users notebooks.
  • CA authority to allocate recognised certificates for the HTTPS server
  • Prometheus for monitoring resource consumption.
  • Specific EGI hooks for monitoring, accounting and backup.
  • VO-Specific storage/Big data facilities or any pluggable tools into the notebooks environment can be added to community specific instances.

Kubernetes

A Kubernetes (k8s) cluster deployed into a resource provider is in charge of managing the containers that will provide the service. On this cluster there are:

  • 1 master node that manages the whole cluster
  • Support for load balancer or alternatively 1 or more edge nodes with a public IP and corresponding public DNS name (e.g. notebooks.egi.eu) where a k8s ingress HTTP reverse proxy redirects requests from user to other components of the service. The HTTP server has a valid certificate from one CA recognised at most browsers (e.g. Let's Encrypt).
  • 1 or more nodes that host the JupyterHub server, the notebooks servers where the users will run their notebooks. Hub is deployed using the JupyterHub helm charts. These nodes should have enough capacity to run as many concurrent user notebooks as needed. Main constraint is usually memory.
  • Support for Kubernetes PersistentVolumeClaims for storing the persistent folders. Default EGI-Notebooks installation uses NFS, but any other volume type with ReadWriteOnce capabilities can be used.
  • Prometheus installation to monitor the usage of resources so accounting records are generated.

All communication with the user goes via HTTPS and the service only needs a publicly accessible entry point (public IP with resolvable name)

Monitoring and accounting are provided by hooking into the respective monitoring and accounting EGI services.

There are no specific hardware requirements and the whole environment can run on commodity virtual machines.

EGI Customisations

EGI Notebooks is deployed as a set of customisations of the JupyterHub helm charts.

image

Authentication

EGI Check-in can be easily configured as a OAuth2.0 provider for JupyterHub's oauthenticator. See below a sample configuration for the helm chart using Check-in production environment:

hub:
  extraEnv:
    OAUTH2_AUTHORIZE_URL: https://aai.egi.eu/auth/realms/egi/protocol/openid-connect/auth
    OAUTH2_TOKEN_URL: https://aai.egi.eu/auth/realms/egi/protocol/openid-connect/token
    OAUTH_CALLBACK_URL: https://<your host>/hub/oauth_callback

auth:
  type: custom
  custom:
    className: oauthenticator.generic.GenericOAuthenticator
    config:
      login_service: "EGI Check-in"
      client_id: "<your client id>"
      client_secret: "<your client secret>"
      oauth_callback_url: "https://<your host>/hub/oauth_callback"
      username_key: "sub"
      token_url: "https://aai.egi.eu/auth/realms/egi/protocol/openid-connect/token"
      userdata_url: "https://aai.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo"
      scope: ["openid", "profile", "email", "eduperson_scoped_affiliation", "eduperson_entitlement"]

To simplify the configuration and to add refresh capabilities to the credentials, we have created a new EGI Check-in authenticator that can be configued as follows:

auth:
  state:
    enabled: true
    cryptoKey: <some unique crypto key>
  type: custom
  custom:
    className: oauthenticator.egicheckin.EGICheckinAuthenticator
    config:
      client_id: "<your client id>"
      client_secret: "<your client secret>"
      oauth_callback_url: "https://<your host>/hub/oauth_callback"
      scope:
      - openid
      - profile
      - email
      - offline_access
      - eduperson_scoped_affiliation
      - eduperson_entitlement

The auth.state configuration allows to store refresh tokens for the users that will allow to get up-to-date valid credentials as needed.

Accounting

Accounting module generates VM-like accounting records for each of the notebooks started at the service. It's available as a helm chart that can be deployed in the same namespace as the JupyterHub chart. The only needed configuration for the chart is an IGTF-recognised certificate for the host registered in GOCDB as accounting.

ssm:
  hostcert: |-
        <hostcert>
  hostkey: |-
        <hostkey>

Monitoring

Monitoring is performed by trying to execute a user notebook every hour. This is accomplished by registering a new service in the hub that has admin permissions. Monitoring is then deployed as a helm chart that must be deployed in the same namespace as the JupyterHub chart. Configuration of JupyterHub must include this section:

hub:
  services:
    status:
       url: "http://status-web/"
       admin: true
       apiToken: "<a unique API token>"

Likewise the monitoring chart is configured as follows:

service:
  api_token: "<same API token as above>"

Docker images

Our service relies on custom images for the hub and the single-user notebooks. Dockerfiles are available at EGI Notebooks images git repository and automatically build for every commit pushed to the repository to eginotebooks @ dockerhub.

Hub image

Builds from the JupyterHub k8s-hub image and adds:

  • EGI and D4Science authenticators
  • EGISpawner
  • EGI look and feel for the login page

Single-user image

Builds from Jupyter datasicence-notebook and adds a wide range of libraries as requested by users of the services. We are currently looking into alternatives for better managing this image with CVMFS as a possible solution.

Sample helm configuration

If you want to build your own EGI Notebooks instance, you can start from the following sample configuration and adapt to your needs by setting:

  • secret tokens (for proxy.secretToken, hub.services.status.api_token, auth.state.cryptoKey). They can be generated with openssl rand -hex 32.
  • A valid hostname (<your notebooks host> below) that resolves to your Kubernetes Ingress
  • Valid EGI Check-in client credentials, these can be obtained by creating a new Service for the demo instance of Check-in through the EGI Federation Registry. When moving to EGI Check-in production environment, make sure to remove the hub.extraEnv.EGICHECKIN_HOST variable.
---
proxy:
  secretToken: "<some secret>"
  service:
    type: NodePort

ingress:
  enabled: true
  annotations:
    kubernetes.io/tls-acme: "true"
  hosts: [<your notebooks host>]
  tls:
  - hosts:
    - <your notebooks host>
    secretName: acme-tls-notebooks
    enabled: true
    hosts: [<your notebooks host>]

singleuser:
  storage:
    capacity: 1Gi
    dynamic:
      pvcNameTemplate: claim-{userid}{servername}
      volumeNameTemplate: vol-{userid}{servername}
      storageAccessModes: ["ReadWriteMany"]
  memory:
    limit: 1G
    guarantee: 512M
  cpu:
    limit: 2
    guarantee: .02
  defaultUrl: "/lab"
  image:
    name: eginotebooks/single-user
    tag: c1b2a2a

hub:
  image:
    name: eginotebooks/hub
    tag: c1b2a2a
  extraConfig:
    enable-lab: |-
            c.KubeSpawner.cmd = ['jupyter-labhub']
    volume-handling: |-
      from egispawner.spawner import EGISpawner
      c.JupyterHub.spawner_class = EGISpawner      
  extraEnv:
    JUPYTER_ENABLE_LAB: 1
    EGICHECKIN_HOST: aai-demo.egi.eu
  services:
    status:
       url: "http://status-web/"
       admin: true
       api_token: "<monitor token>"

auth:
  type: custom
  state:
    enabled: true
    cryptoKey: "<a unique crypto key>"
  admin:
    access: true
    users: [<list of EGI Check-in users with admin powers>]
  custom:
    className: oauthenticator.egicheckin.EGICheckinAuthenticator
    config:
      client_id: "<your egi checkin_client_id>"
      client_secret: "<your egi checkin_client_secret>"
      oauth_callback_url: "https://<your notebooks host>/hub/oauth_callback"
      enable_auth_state: true
      scope:
      - openid
      - profile
      - email
      - offline_access
      - eduperson_scoped_affiliation
      - eduperson_entitlement