If you prefer to learn about job submission in a course with handson sessions please consider our HPC-Introduction courses.

Summary

PC2 cluster system use SLURM as a scheduler/workload manager.
sharing of compute nodes among jobs is enabled
cpu-cores, memory and GPUs are allocated exclusively to jobs
start your MPI jobs with srun and not mpirun or mpiexec
You only see your own compute jobs. If you are the project administrator of a compute-time project you see all jobs of the project.

Cluster and Usage Overview

Jobs

You can either run batch jobs or interactive jobs.

Batch Jobs

In a batch jobs you set up a bash script that contains the commands that you want to execute in a job, i.e.

#!/bin/bash
echo "Hello World"

It is recommended to include parameters for the job in the jobscript. For SLURM these lines start with #SBATCH. Because they are comments, i.e. they start with # they are ignored by the normal bash shell but read and interpreted by SLURM.

Important parameters are

Line	Mandatory	Meaning
#SBATCH -t TIMELIMIT	YES	specify the time limit of your job. Acceptable time formats for `TIMELIMIT` include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
#SBATCH -N NODES	no, default is 1	use at least the number of `NODES` for the job
#SBATCH -n NTASKS	no, default is 1	run `NTASKS` tasks in your job. A task is usually an MPI rank.
#SBATCH -J NAME	no, default is the file name of the job script	specify the `NAME` of the compute job
#SBATCH -p PARTITION	no, default is the normal partition	Submit jobs to the `PARTITION` partition. For a description of partition see Partitions and QOS
#SBATCH -A PROJECT	not if you are only member of one compute-time project	specify the compute-time project `PROJECT` to use with this job
#SBATCH -q QOS	no, default is the default QoS of your project	Use the QoS `QOS`. For a description of QoS see Partitions and QOS
#SBATCH --mail-type MAILTYPE	no, default value is `NONE`	specify at which event you want a mail notification. `MAILTYPE`can be NONE, BEGIN, END, FAIL, REQUEUE, ALL.
#SBATCH --mail-user MAILADDRESS	no	specify your mail that should receive the mail notifications

So that overall a practical example great_calculation.sh could look like:

#!/bin/bash
#SBATCH -t 2:00:00
#SBATCH -N 2
#SBATCH -n 10
#SBATCH -J "great calculation"
#SBATCH -p normal

#run your application here

Many more options can be found in the man page of sbatch at https://slurm.schedmd.com/sbatch.html or by running man sbatch on the command-line on the cluster.

Submitting Batch Jobs

Once you have a job script, e.g. great_calculation.sh, you can submit it to the workload manager with the command

sbatch great_calculation.sh

If everything went well, it will return a job id which is a unique integer number that identifies your job. This means that your job is now queued for execution.

To monitor the state of your jobs please have a look at https://uni-paderborn.atlassian.net/wiki/spaces/PC2DOK/pages/12944358/How+to+Monitor+Your+Jobs.

Monitor the State of Your Job

Stopping Batch Jobs

Running Jobs with Parallel Calculations, i.e. MPI

Interactive Jobs

In an interactive job you type the commands to execute yourself in real time. Interactive jobs are not recommended on an HPC cluster because you usually don't know beforehand when your job is going to be started. Details on interactive jobs can be found in Interactive Jobs .

Using Nodes Exclusively

Compute nodes on our clusters are can be shared by multiple compute jobs. Please note that the requested number of cpu cores, memory, and GPUs are always allocated exclusively for a job. That means that if multiple jobs run on a compute node, they will not share the same cpu cores, memory or GPUs.

If you want to use a complete node exclusively for your job, i.e. don’t want other peoples jobs to use cpu cores or memory that your job hasn’t allocated, then you can add

#SBATCH --exclusive

to your job script. Be aware that you then have to “pay“ for the whole node with your compute-time contingent even if didn’t allocated all cores or memory for your jobs.

Using GPUs

daily work commands are

sbatch - submit a batch script to SLURM
srun - run parallel job
scancel - signal or cancel job under the control of SLURM
squeue - information about running jobs
sinfo - info about the partitions and nodes
scluster - info about currently allocated, free and offlined nodes

Running Compute Jobs