Custom reproducible computing environments for notebooks

What is it?

Binder allows the re-creation of a custom computing environment for reproducible execution of notebooks (and potentially many other types of applications). Users who create their own notebooks in the EGI Notebooks to analyze data can easily create a shareable link for those notebooks in the form of a GitHub repository. Based on this link, anyone can then reproduce the same data analysis using the link in the EGI Binder service.

The service builds on BinderHub, an Open Source tool that allows to build docker images from a Git repository and then makes them available through your browser.

EGI Binder offers a service similar to the publicly accessible site. However, EGI Binder has the following additional features:

  • Access with academic user accounts: login via Check-in that’s connected to eduGAIN and social media accounts.
  • Access to scalable storage: selected storage spaces of EGI DataHub are directly available under the datahub folder, simplifying the access to shared data from Binder notebooks.
  • Guaranteed capacity: environments have 2GB of RAM guaranteed and can reach 4GB as maximum.
  • Persistent sessions: There is no hard limit on the session time per user, although sessions will be shut down automatically after 1 hour of inactivity (see session limitations at the public service).
  • Access to the rest of EGI services: a personal access token is available in the Binder session to interact with the rest of the EGI infrastructure.
  • Community Binder environments: User communities can have their customized Binder service instance from EGI, with extra features as requested (such as access to GPUs, integration with community specific data repositories and services). EGI offers consultancy and support the setup of these instances, and provides operational oversight for them.

Reproducible research

Binder facilitates the sharing and reproducibility of digital data analysis:

  1. Users can define their computational analysis in the EGI Notebooks service.
  2. Once the notebook is ready for publishing, it can be shared in a GitHub repository.
  3. Optionally, users can use the Zenodo-GitHub integration for generating DOIs that can be cited in publications and can be discovered by fellow researchers
  4. Anyone can use the link to the GitHub repository or Zenodo DOI to reproduce the computational analysis in EGI Binder.

Reproducible research flow

Access to the service

EGI’s Binder has the same access conditions as the centrally operated Notebooks service from EGI. Before using the service, you need to have an EGI account and be a member of one of the supported resource pools (alias Virtual Organisations). Follow the instructions on the EGI Binder login page for access

Creating a Binder repository

Binder starts from a code repository that contains the code or notebook you’d like to run and a set of configuration files that specify what’s the exact computational environment your code needs to run.

Binder then creates a reproducible container using repo2docker, and generates a user session to interact with the container in the browser.

The configuration for building the container supports specifying conda environments; installing Python, R and Julia environments; installing additional OS packages; and even complete custom Dockerfiles to bring any application to the system. The code repository can be hosted on popular git hosting platforms like GitHub and GitLab and can also be referenced with a DOI from Zenodo, FigShare or Dataverse. You can learn more on the configuration of your repository with Binder at the Binder user documentation

You can start by forking the EGI-Federation/binder-example GitHub repository for creating your own reproducible environment. To run this directly on EGI’s Binder click on the button below:


You can create such link to share your notebooks from the Binder interface, as shown in the screenshot below, you can copy the URL shown when the building is in progress:

Binder link

The binder examples organisation on GitHub contains more sample repositories for common configurations that can help you getting started.

Accessing data

Your notebooks running in Binder have outgoing internet connectivity, so you can connect to external services to bring data in for analysis or deposing the notebooks output.

Every session that you start will also provide access to your spaces in the DataHub under a folder named datahub. Only those spaces configured to be mounted locally will be made available automatically. Check the documentation for the Notebook’s DataHub support for more information.