Quality-of-Service (QoS) and Job Priorities
Summary
The main quantity that determines the priority of your jobs is the relation of cluster usage in terms of CPU-core-hours and GPU/FPGA hours in relation to the granted contingent.
Important tools:
pc2status
: view your projects and their compute-resource usage, quotas, and your usage of the express-prioritysqueue_pretty
: view priorities of your pending jobsspredict
: show the estimated start time of your pending jobssprio
: show how different factors contribute to the overall priority of your pending jobsBe aware of the possibility to use the express-priority, the fpgasynthesis-priority or request an increase in resources.
Job Priorities
Usage and Contingents
You can view your compute-time projects, their cluster usage, and other interesting information with the command-line tool pc2status
. The main quantity that determines the priority of your jobs is the relation of cluster usage in terms of CPU-core-hours and GPU/FPGA hours in relation to the granted contingent.
Project Usage U30: The 30-day project usage is the sum of the project usage in the last 30 days plus the resources that the currently running jobs of the project would consume assuming that they run till their time limit.
Project Usage U60: The 60-day project usage is the sum of the project usage in the last 60 days plus the resources that the currently running jobs of the project would consume assuming that they run till their time limit. U60-U30 is the usage in the period between 60 days and 30 days ago.
Contingent C30: The 30-day contingent for the individual resources (CPU-Core-hours, GPU/FPGA-hours) is the total granted compute resources for 30 days, i.e. (total granted)/(project duration in days)*(30). If the project has not been running for 30 days yet or the remaining time is less than 30 days then the C30 is lowered accordingly.
Contingent C90: The 90-day contingent for the individual resources (CPU-Core-hours, GPU/FPGA-hours) is the sum of:
the C30,
half of max(0,(C30-(U60-U30))), i.e. half of the remaining C30 contingent from 30 days ago,
and half the C30, i.e. half of the future C30 contingent
Examples:
A project has started 5 days ago but hasn’t used any resources yet: C30=(total granted)/(project duration in days)*5, C90=C30+0.5*C30=1.5*C30
A project has used
has been running for more than 60 days and will run for more than 30 days in the future,
hasn’t used any resources in the last 30 days,
and has used half of the C30 in the time period between 30 days and 60 days ago,
then C90=C30+1/2*(C30-1/2*C30)+1/2*C30=2.25*C30
Priority
The priority is an integer number. The higher the better. You can view the priority of your pending jobs with squeue_pretty
.
The priority is computed as:
Job_priority = 500.000 * (QoS_factor)
+ 50.000 * (partition_factor)
+ 35.000 * (age_factor)
+ 15.000 * (job_size_factor)
The QoS_factor depends on the QoS-level of a job.
Quality-of-Service (QoS): qos_factor
The following QoS depend on the project usage. Only one of them is active for a project at a time.
QoS name | Usage | QoS- factor |
---|---|---|
cont | The project has used less that its C30 contingent in the last 30 days, i.e. U30<=C30. | 0.6 |
lowcont | C30<U30<=C90 | 0.4 |
nocont | U30>C90 or total usage>granted contingent | 0.2 |
suspended | project is not active, i.e. expired, locked or hasn’t started yet | 0 |
There are also special QoS that can be chosen by the user for jobs:
QoS name | Usage | QoS limits | QoS- factor | Limits |
---|---|---|---|---|
test | for test projects | at most 2 runnning jobs per user and at most 2 submitted jobs per user | 0.8 | maxSubmitJobs per user = 2 maxRunningJobs per user = 2 |
express | urgent jobs with high-priority | 1000 CPU-Core-h per user and month and 30 GPU/FPGA-hours | 0.8 | maxSubmitJobs per user = 100 |
fpgasynthesis | FPGA synthesis for FPGA projects | allowed partition on normal and largemem, only single-node jobs, at most 10 running jobs per user | 0.8 | maxRunningJobs per user = 10 |
eaccess | for employees of Paderborn University (see HPC Easy Access), no formal compute-time project needed | currently 500 CPU-Core-h plus 5 GPU-hours per user and month | 0.6 | maxSubmitJobs per user = 100 |
devel | for GPU testing and development (DGX node on Noctua2) |
Partition Factor: partition_factor
The partition_factor is 0 for most partitions. Only the fpga-partition has a factor of 1. This is done to allow more freedom in possible future overlapping of partitions.
Waiting Time: age_factor
The age_factor depends on the waiting time of a job. A job that was just submitted has an age_factor of 0. While the job is pending the age_factor grows linearly till it reaches a value of 1 after 10 days of waiting.
Job Size: job_size_factor
The job_size_factor prioritizes larger jobs. The value between 0 and 1 is given by requested resources divided by the total resources available in the cluster. A full-system job has a value of 1.
How to get a Higher Priority
Increasing the Project Contingent
If you either temporarily or permanently need a higher project contingent, then the PI of a project can request it from the Resource Allocation Board of PC2 via mail to pc2-support@uni-paderborn.de. Please include a justification for why you need the increase. Normal and large projects can be increased by at most 25 % over the initially granted total contingent.
express-Priority
The express-QoS (#SBATCH -q express
) gives your jobs a very high priority. Each user has a monthly quota listed in the table above. If this quota is exceeded no jobs can be submitted with the express-priority until the quota is refreshed at the beginning of the next month.
You can only use the partitions that you could use via the regular priority in a compute-time project. Thus, if you don't have access to GPUs in a compute time project you also can’t access them via the express-priority.
fpgasynthesis-Priority (Noctua 2 only)
Projects that have been granted access to FPGAs also have access to the fpgasynthesis-QoS (#SBATCH -q fpgasynthesis
). Using it gives their FPGA bitstream-synthesis jobs a high priority. The only allowed partitions are normal and largemem. Only single-node jobs are allowed and at most 10 running jobs per user.