<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Documentation – DataHub</title><link>/providers/datahub/</link><description>Recent content in DataHub on Documentation</description><generator>Hugo -- gohugo.io</generator><atom:link href="/providers/datahub/index.xml" rel="self" type="application/rss+xml"/><item><title>Providers: DataHub OneProvider</title><link>/providers/datahub/oneprovider/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/providers/datahub/oneprovider/</guid><description>
&lt;p>This documentation covers how to install and configure a OneData OneProvider in
order to join a new or existing EGI DataHub space. In particular two types of
installations are available, depending if the provider wants to support the
space with an empty storage or if existing data should be exposed via the
Oneprovider.&lt;/p>
&lt;h2 id="requirements-for-production-installation">Requirements for production installation&lt;/h2>
&lt;ul>
&lt;li>Oneprovider
&lt;ul>
&lt;li>RAM: 32GB&lt;/li>
&lt;li>CPU: 8 vCPU&lt;/li>
&lt;li>Disk: 50GB SSD&lt;/li>
&lt;li>To be adjusted for the dataset and usage scenario&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>For high Input/Output operations per second (IOPS)
&lt;ul>
&lt;li>High performance backend storage (CEPH)&lt;/li>
&lt;li>Low latency network&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="network-requirements">Network Requirements&lt;/h3>
&lt;ul>
&lt;li>The following ports need to be open on the local and site firewall:
&lt;ul>
&lt;li>80, 443, 9443, 6665 (for data transfer)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="alert alert-warning" role="alert">
&lt;h4 class="alert-heading">Warning&lt;/h4>
The Oneprovider installation includes also a Memcached server running on
port 11211. Please ensure that this port and other unused ports are not open to
the internet by setting up proper local firewall rules.
&lt;/div>
&lt;h2 id="installation-and-attach-empty-storage-to-the-egi-datahub">Installation and attach empty storage to the EGI DataHub&lt;/h2>
&lt;p>The installation of a new Oneprovider is performed using the &lt;code>onedatify&lt;/code>
installation script which will setup the components using Docker and
Docker-compose.&lt;/p>
&lt;p>This simple installation script can be generated from the EGI DataHub interface.&lt;/p>
&lt;p>Firstly, you need to login to the EGI DataHub and using the &lt;code>Data&lt;/code> menu you
either select an existing space or create a new one.&lt;/p>
&lt;p>Secondly, you can select on the space menu the &lt;code>Providers&lt;/code> section and click on
the &lt;code>Add Support&lt;/code> button on the top right corner.&lt;/p>
&lt;p>&lt;img src="add-support-oneprovider.png" alt="image">&lt;/p>
&lt;p>You should then select on the page the tab: &lt;code>Deploy your own provider&lt;/code>&lt;/p>
&lt;p>In order to obtain the &lt;code>PROVIDER_REGISTRATION_TOKEN&lt;/code> that is part of the
preconfigured command, you need to contact the
&lt;a href="https://helpdesk.ggus.eu/">EGI HelpDesk&lt;/a>, selecting DataHub as Group.&lt;/p>
&lt;p>Please include in the request the following info:&lt;/p>
&lt;ul>
&lt;li>Use case for the installation of a Oneprovider connected to the EGI DataHub&lt;/li>
&lt;li>Name and the email of the Oneprovider admin&lt;/li>
&lt;li>domain name of the Oneprovider&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="onedatify-oneprovider.png" alt="image">&lt;/p>
&lt;h3 id="run-the-command-on-the-host">Run the command on the host&lt;/h3>
&lt;p>With the obtained &lt;code>PROVIDER_REGISTRATION_TOKEN&lt;/code>, paste the copied command in the
terminal on the Oneprovider machine as superuser.&lt;/p>
&lt;p>If necessary, the Onedatify script will ask for permission to install all
necessary dependencies including Docker and Docker Compose.&lt;/p>
&lt;p>After the dependency installation is complete, the script will ask several
questions and suggest default settings:&lt;/p>
&lt;p>&lt;img src="onedatify_step_1.png" alt="image">&lt;/p>
&lt;p>The progress can be monitored on a separate terminal using the following
command:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>onedatify logs
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>After the deployment is complete, a message will be shown, including connection
details for the administration panel of the Oneprovider:&lt;/p>
&lt;p>&lt;img src="onedatify_step_5.png" alt="image">&lt;/p>
&lt;p>This administration panel at port &lt;code>9443&lt;/code> can be used to administer the
Oneprovider.&lt;/p>
&lt;h2 id="installation-and-expose-existing-data-to-the-egi-datahub">Installation and expose existing data to the EGI DataHub&lt;/h2>
&lt;p>The installation of a new OneProvider to expose existing datasets to an EGI
DataHub space is similar to the installation with an empty storage.&lt;/p>
&lt;p>When adding support to an existing or new space you should select from the EGI
DataHub user interface the tab : &lt;code>Expose Existing dataset&lt;/code>.&lt;/p>
&lt;p>Please refer to the steps described in the
&lt;a href="#installation-and-attach-empty-storage-to-the-egi-datahub">Installation and attach empty storage to the EGI DataHub&lt;/a>
section to request a &lt;code>PROVIDER_REGISTRATION_TOKEN&lt;/code>.&lt;/p>
&lt;p>Once obtained, you will have to copy the command already configured with the
correct parameters for the Onezone (&lt;code>datahub.egi.eu&lt;/code>) and the space to join.&lt;/p>
&lt;p>&lt;img src="onedatify-oneprovider-expose.png" alt="image">&lt;/p>
&lt;h3 id="run-the-command-on-the-host-1">Run the command on the host&lt;/h3>
&lt;p>Paste the copied command in the terminal on the Oneprovider machine as
superuser, and follow the instructions as for the case of an empty storage.&lt;/p>
&lt;p>The only difference is that at the end of the installation and configuration
process the existing files will be automatically imported to the OneProvider.&lt;/p>
&lt;p>You can monitor the import activity from the administration panel at port 9443.&lt;/p>
&lt;p>&lt;img src="onedatify_step_6.png" alt="image">&lt;/p>
&lt;h2 id="additional-configuration-for-egi-datahub">Additional configuration for EGI DataHub&lt;/h2>
&lt;p>After completing the installation it might be necessary to add some specific
configuration depending on the use case.&lt;/p>
&lt;h3 id="storage-import">Storage Import&lt;/h3>
&lt;p>Storage import function is used to import files located on a storage and not
added or modified directly by DataHub API, client or web interface. For example
if another application access the storage directly as, for example in an NFS
share. The file registration process creates the necessary metadata so that
files that are added, removed or modified directly on the storage are reflected
and accessible in the corresponding space. It is possible to configure the
storage to automatically detect changes made on the storage using the continuous
scan option or by manually triggering scans.&lt;/p>
&lt;p>In general, there are three basic space usage scenarios to be considered before
deciding to enable the automatic scan of the storage or to perform a single
scan:&lt;/p>
&lt;ol>
&lt;li>An initially empty space, modified only via DataHub -&amp;gt; no need to enable the
auto-scan or to perform a manual scan.&lt;/li>
&lt;li>A space with some initial data that has been imported once and then the space
is only modified via DataHub -&amp;gt; In this case the auto scan is not needed
however it needs to somehow guard the storage from external modifications
otherwise it will desynchronise with the space. E.g.: all changes should be done
trough API, DataHub client or the web interface.&lt;/li>
&lt;li>A space that exposes a dataset from a storage that later is still dynamically
changing outside of DataHub (when it is assumed that some modification can
happen at some point) -&amp;gt; in this case continuous scan is needed.&lt;/li>
&lt;/ol>
&lt;p>Every storage import scan causes additional load on the Oneprovider. If the
space is large and changes very often outside of Onedata, it might have a
visible impact on the overall performance - but we are talking millions of files
and hundreds of changes per second. Otherwise, having the continuous scan run in
the background at the default value, with interval of 60 seconds should not
cause any issues. Depending on the specific needs however, if the loads becomes
too high or if the changes need to be applied timely it is possible to increase
or decrease this interval respectively.&lt;/p>
&lt;p>In the following screenshot is shown how to access the configuration for the
storage import:&lt;/p>
&lt;p>&lt;img src="storage-import-01.png" alt="image">&lt;/p>
&lt;p>This configuration page is located under the &amp;ldquo;CLUSTER&amp;rdquo; tab in the main page.
There we should select the cluster that is providing the space that we need to
configure, Spaces and select the specific space that need to be configured. In
the example shown, the &amp;ldquo;Auto storage import configuration&amp;rdquo; menu has been
expanded and continuos scan enabled.&lt;/p></description></item><item><title>Providers: DataHub WebDAV</title><link>/providers/datahub/webdav/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/providers/datahub/webdav/</guid><description>
&lt;p>This documentation covers how to configure a OneData OneProvider with a WebDAV backend as an existing storage. This configuration has been tested with Nextcloud and XrootD.&lt;/p>
&lt;h2 id="requirements">Requirements&lt;/h2>
&lt;ul>
&lt;li>An existing installation of DataHub and one provider&lt;/li>
&lt;li>An existing installation of WebDAV to be used as backend storage with access credentials&lt;/li>
&lt;/ul>
&lt;h3 id="network-requirements">Network Requirements&lt;/h3>
&lt;ul>
&lt;li>The WebDAV storage should be accessible from the OneProvider installation
&lt;ul>
&lt;li>443 port should be available for access and data transfer&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="configuration-of-webdav">Configuration of WebDAV&lt;/h2>
&lt;p>In this example an installation of Nextcloud has been used. In particular, once logged in the web interface we need to check the WebDAV section on the &amp;ldquo;File settings&amp;rdquo;. The &amp;ldquo;Files settings&amp;rdquo; section can be accessed from the bottom lef of the web interface.&lt;/p>
&lt;p>&lt;img src="Nextcloud-File-Settings.png" alt="image">&lt;/p>
&lt;p>This opens the Files settings where we can check and copy the URL to access the WebDAV interface as show in the following screenshot:&lt;/p>
&lt;p>&lt;img src="Nextcloud-Webdav-conf.png" alt="image">&lt;/p>
&lt;h2 id="configuration-of-oneprovider">Configuration of OneProvider&lt;/h2>
&lt;p>On the DataHub interface on the &amp;ldquo;Storage backends&amp;rdquo; section of the Cluster configuration associated to the Space that we want to configure click on the &amp;ldquo;Add storage backend&amp;rdquo;. This will open the following configuration:&lt;/p>
&lt;p>&lt;img src="OneProvider-Storage_Backends.png" alt="image">&lt;/p>
&lt;p>Where we can select the &amp;ldquo;Type&amp;rdquo; as &amp;ldquo;WebDAV&amp;rdquo;, the endpoint as the URL copied from Nextcloud in the previous steps and the credential to access NextCloud as &amp;ldquo;Credential type&amp;rdquo; basic and &amp;ldquo;login:password&amp;rdquo;&lt;/p>
&lt;div class="alert alert-warning" role="alert">
&lt;h4 class="alert-heading">Warning&lt;/h4>
&lt;p>At this stage if we try to add the storage backend we will see the following error:&lt;/p>
&lt;p>&lt;img src="Readonly-WebDAV.png" alt="image">&lt;/p>
&lt;p>This is because of a limitation on the WebDAV implementation of Nextcloud. In particular it does not have &amp;ldquo;Range write support&amp;rdquo; while OneData, by design supports only this way of writing as it is expected to write large amount of data. Selecting the &amp;ldquo;Read only&amp;rdquo; option allow to complete the configuration. Going back to the Files section we can then browse the files present on the imported WebDAV storage.&lt;/p>
&lt;/div>
&lt;p>A similar procedure can be used and has been tested for XrootD however in this case as the Nextcloud tests, the &amp;ldquo;Range write support&amp;rdquo; was not available on the instance tested and the readonly access was possible.&lt;/p></description></item></channel></rss>