This section shows the basic thing things you need to know to use our cluster systems:
Child pages (Children Display) |
---|
If you are completely new to HPC on cluster systems, we recommend our trainings for HPC beginners.
Here are answers to the most important basic questions:
What is an HPC cluster?
An HPC cluster system consists of many server computers (nodes) that are connected via a very fast network and operated in a data center.
The server computers are built to execute computationally intensive research applications.
The servers are shared between many users.
We operate multiple clusters with different hardware configurations.
More information about
How is a cluster structured?
The most nodes in a cluster are compute nodes, where the compute jobs are executed.
The compute nodes are grouped into "partitions" depending on the hardware configuration.
For example, there are compute nodes with particularly large memory or additional acceleration cards such as GPUs and FPGAs.
The login nodes in a cluster are used to prepare jobs and copy data, but not to execute computations.
More information about
What hardware is available in the cluster?
How do I interact with the SLURM job scheduler?
You use command line commands to submit jobs to SLURM, check on the current job status or cancel jobs.
The easiest way to submit a job is to create a job script.
A job script contains the normal shell script with the commands to execute in the job and additional SLURM arguments to specify the resources to be allocated by SLURM.
More about
What is the typical workflow when using a cluster system?
Typically, a cluster user follows these steps:
Develop the research application on a local computer and test it.
Measure the runtime of the program with a small workload and create an estimation for the program execution with a large workload.Use the login node on the cluster to edit job scripts, upload/download data and prepare the job.
Use the login node to submit jobs to the SLURM scheduler and check on the job status.
Wait for the job to finish.
Use the login node to look at the job's log file and result data. Copy the result data back to the local computer.
How many resources am I allowed to use?
Every user is part of a project with a resource contingent.
All used resources are subtracted from the project resource contingent.
More information about
How can I get access to the compute resources?
The SLURM scheduler manages the allocation of the compute resources for the cluster users.
To run a program on a compute node, you need to specify what resources you need, the runtime and the command that is executed on the compute node.
This is called a job.
Then the scheduler creates a plan for when to run the job on which compute nodes.
Since the cluster is shared by many users, it can happen that your job will be delayed until enough resources are available.
The scheduler also assigns a priority to each job to calculate the order of jobs in the schedule.
More about
What software is available on the cluster?
We are running a Linux operating system on all cluster nodes.
We have installed commonly used research software, libraries, and utilities.
If you are missing common software, you can contact use.
You get access to the software, by using the environment "module" system.
After loading the particular environment module of a program, your shell will find the program executable.
More information about
What directories are available for data storage on the cluster?
Each user has access to a user's home folder and group directories, depending on the projects they are part of.
The different directories are meant to be used in different situations.
More information about
How can I access the cluster?
Once you have an account, you can access the cluster via VPN.
Then you can get a remote shell via SSH or a web frontend via JupyterHub.
More about
How can I edit files on the cluster?
Via the remote shell, you can use command line editors such as vim or nano.
Using the JupyterHub, you can also edit files via your browser.
You can also use the remote development feature of Visual Studio Code.
More about
How can I transfer files to/from the cluster?
Here you can find information about data transfer to/from the cluster.
Can I use graphical user interfaces on the cluster?
Most software is command line only.
If you only want to edit files, you can use the Visual Studio Code remote development feature or JupyterHub.
If you want to execute a GUI application on the cluster, you can use X-forwarding to tunnel the graphics over SSH or the noVNC feature of JupyterHub.
This is rather uncommon and we discourage the usage of GUI applications on the cluster.
More about