Data Management in Notebooks
Every user of the EGI Notebooks catch-all instance has a 20GB persistent home to store any notebooks and associated data. The content of this home directory will be kept even if your notebook server is stopped (which can happen if there is no activity for more than 1 hour). Modifications to the notebooks environment outside the home directory are not kept (e.g. installation of libraries). If you need those changes to persist, let us know via a GGUS ticket to the Notebooks Support Unit. You can also ask for increasing the 20GB home via ticket.
Import notebooks into your workspace
The Notebooks service default environment includes nbgitpuller, an extension to sync a git repository one-way to a local path. You can generate a shareable link by filling in the nbgitpuller link generator with your git repository.
Alternatively, you can also use Binder for providing a link to notebooks and their computing environment.
Getting data in/out
Your notebooks have outgoing internet connectivity so you can connect to any external service to bring data in for analysis. As with input data, you can connect to any external service to deposit the notebooks output.
This is convenient for smaller datasets but not practical for larger ones, for those cases we can offer integration with several data services. These are not enabled in the catch-all instance but can be made available on demand.
EGI DataHub provides a scalable distributed data infrastructure. It offers a tight integration with Jupyter and notebooks with specific drivers that make the DataHub Spaces accessible from any notebook.
Whenever you log into the service, supported DataHub spaces will be available
datahub folder. If you need support for any additional space, please
open a ticket in GGUS to add it.
Alternatively, you can also use the
fs-onedatafs library from your
code. For convenience, the
ONEPROVIDER_HOST environment variable will point to
the default oneprovider for the Notebooks and the
variable will contain a valid access token for the service.
from fs.onedatafs import OnedataFS # create the OnedataFS driver using defaults from env odfs = OnedataFS(os.environ['ONEPROVIDER_HOST'], os.environ['ONECLIENT_ACCESS_TOKEN'], force_direct_io=True) # use it to open a file f = odfs.open("<datahub file path>")
ONECLIENT_ACCESS_TOKEN variables are obtained as
part of the login process and made available in the notebooks environment
automatically. You can also specify a different oneprovider host if needed.
EUDAT B2DROP is a low-barrier, user-friendly and trustworthy storage environment which allows users to synchronise their active data across different desktops and to easily share this data with peers. EUDAT offers a free public instance of B2DROP for any researcher with a 20 GB quota.
The data on B2DROP can be synchronised with EGI Notebooks so you can share content between the two services. This offers an easy-to-use storage and compute platform for the long-tail of science.
Here is how you can get them synchronised. First, make sure
you have access to B2DROP. Then, configure
app username and
app password on B2DROP’s
Now, back to EGI Notebooks, click on the
B2DROP connection drop-down
menu when you start your session:
app username and
app password created previously, with
the option to save them for future logins:
You will see a
b2drop folder in the list of folders (left panel) of the
EGI Notebooks that is synchronised with the content on
D4Science VREs provide a shared workspace via a
EGI Notebooks embedded in D4Science VREs will automatically show the user’s
workspace at the
workspace directory. You can browse and use as any regular
The Notebooks service can enable shared folders for users, either in read-only
or read-write mode. These are specially meant for community instances for easing
the sharing of data between all the users of the service. In the catch-all
datasets directory serves as an example of such feature.
We are open for integration with other services for facilitating the access to
input and output data. Please contact
support _at_ egi.eu with your request so
we can investigate the best way to support your needs.