Querying the Information System
Querying the Information System
High Throughput Compute (HTC) is a computing paradigm that focuses on the efficient execution of a large number of loosely-coupled tasks (e.g. data analysis jobs). HTC systems execute independent tasks that can be individually scheduled on many different computing resources, across multiple administrative boundaries. Users submit these tasks to the infrastructure as jobs. After a job have been scheduled and executed, the output can be collected from the service(s) that executed the job.
The target customers for EGI High Throughput Compute are research communities that need to share, store, process, and produce large sets of data. Typically, their research collaborations involve organizations across Europe and the World.Some may already have local resources (e.g. universities, research institutions) that can only be accessed by local users in accordance to the respective organisation’s access policies.
In case of local compute resources researchers can request access to the local compute cluster from their IT department. However, when researchers join collaborations that need to share their research activities, data collections, and repositories, they need a homogeneous and coordinated operation of the compute resources, which are not uniformly accessible. In addition, nowadays many research collaborations generate large amounts of data, and managing such data volumes is time consuming and error-prone.
The EGI High Throughput Compute service provides access to compute resources, and offers a set of high-level tools that allow managing large amounts of data in a collaborative way (e.g authorization and access control tools can be regulated by the research collaboration in a central manner, data can be uniformly distributed in the EGI Cloud, etc.).
EGI High Throughput Compute provides easy, uniform access to shared computing and data services of EGI service providers. Most software deployed in the distributed resource centres is based on open standards and open source middleware services.
The main features of the EGI High Throughput Compute are:
The EGI High Throughput Compute infrastructure is the federation of GRID resources provided by EGI providers. Its aim is to share in a secure way the distributed IT resources that are part of the EGI Cloud. It comprises of:
The key components of the EGI High Throughput Compute architecture are:
Access to HTC resources in the EGI infrastructure is based on X.509 certificates and Virtual Organisations (VOs).
VOs are fully managed by research communities, allowing communities to manage their users and grant access to their services and resources. This means communities can either own their resources and use EGI services to share (federate) them, or can use the resources available in the EGI infrastructure for their scientific needs.
Before users can access EGI HTC services, they have to:
If you are interested in using command-line and direct submission to Compute Elements, there is a tutorial on HTC job submission.