...
If you prefer to learn about job submission in a course with handson sessions please consider our HPC-Introduction courses.
Summary
PC2 cluster system use SLURM as a scheduler/workload manager.
sharing of compute nodes among jobs is enabled
cpu-cores, memory and GPUs are allocated exclusively to jobs
start your MPI jobs with
srun
and notmpirun
ormpiexec
You only see your own compute jobs. If you are the project administrator of a compute-time project you see all jobs of the project.
Cluster and Usage Overview
Jobs
You can either run batch jobs or interactive jobs.
Batch Jobs
In a batch jobs you set up a bash script that contains the commands that you want to execute in a job, i.e.
Code Block |
---|
#!/bin/bash
echo "Hello World" |
It is recommended to include parameters for the job in the jobscript. For SLURM these lines start with #SBATCH
. Because they are comments, i.e. they start with #
they are ignored by the normal bash shell but read and interpreted by SLURM.
Important parameters are
Line | Mandatory | Meaning | ||
---|---|---|---|---|
| YES | specify the time limit of your job. Acceptable time formats for | ||
| no, default is 1 | use at least the number of | ||
| no, default is 1 | run | ||
| no, default is the file name of the job script | specify the | ||
| no, default is the normal partition | Submit jobs to the | ||
| not if you are only member of one compute-time project | specify the compute-time project | ||
| no, default is the default QoS of your project | Use the QoS | ||
| no, default value is | specify at which event you want a mail notification. | ||
| no | specify your mail that should receive the mail notifications |
So that overall a practical example great_calculation.sh
could look like:
Code Block |
---|
#!/bin/bash
#SBATCH -t 2:00:00
#SBATCH -N 2
#SBATCH -n 10
#SBATCH -J "great calculation"
#SBATCH -p normal
#run your application here |
Many more options can be found in the man page of sbatch at https://slurm.schedmd.com/sbatch.html or by running man sbatch
on the command-line on the cluster.
Submitting Batch Jobs
Once you have a job script, e.g. great_calculation.sh
, you can submit it to the workload manager with the command
sbatch great_calculation.sh
If everything went well, it will return a job id which is a unique integer number that identifies your job. This means that your job is now queued for execution.
To monitor the state of your jobs please have a look at https://uni-paderborn.atlassian.net/wiki/spaces/PC2DOK/pages/12944358/How+to+Monitor+Your+Jobs.
Monitor the State of Your Job
Stopping Batch Jobs
Running Jobs with Parallel Calculations, i.e. MPI
Interactive Jobs
In an interactive job you type the commands to execute yourself in real time. Interactive jobs are not recommended on an HPC cluster because you usually don't know beforehand when your job is going to be started. Details on interactive jobs can be found in Interactive Jobs .
Using Nodes Exclusively
Compute nodes on our clusters are can be shared by multiple compute jobs. Please note that the requested number of cpu cores, memory, and GPUs are always allocated exclusively for a job. That means that if multiple jobs run on a compute node, they will not share the same cpu cores, memory or GPUs.
If you want to use a complete node exclusively for your job, i.e. don’t want other peoples jobs to use cpu cores or memory that your job hasn’t allocated, then you can add
#SBATCH --exclusive
to your job script. Be aware that you then have to “pay“ for the whole node with your compute-time contingent even if didn’t allocated all cores or memory for your jobs.
Using GPUs
daily work commands are
sbatch - submit a batch script to SLURM
srun - run parallel job
scancel - signal or cancel job under the control of SLURM
squeue - information about running jobs
sinfo - info about the partitions and nodes
scluster - info about currently allocated, free and offlined nodes