Quality-of-Service (QoS) and Job Priorities

Summary

The main quantity that determines the priority of your jobs is the relation of cluster usage in terms of CPU-core-hours and GPU/FPGA hours in relation to the granted contingent.

Important tools:

  • pc2status: view your projects and their compute-resource usage, quotas, and your usage of the express-priority

  • squeue_pretty: view priorities of your pending jobs

  • spredict: show the estimated start time of your pending jobs

  • sprio: show how different factors contribute to the overall priority of your pending jobs

  • Be aware of the possibility to use the express-priority, the fpgasynthesis-priority or request an increase in resources.

Job Priorities

Usage and Contingents

You can view your compute-time projects, their cluster usage, and other interesting information with the command-line tool pc2status. The main quantity that determines the priority of your jobs is the relation of cluster usage in terms of CPU-core-hours and GPU/FPGA hours in relation to the granted contingent.

Project Usage U30: The 30-day project usage is the sum of the project usage in the last 30 days plus the resources that the currently running jobs of the project would consume assuming that they run till their time limit.

Project Usage U60: The 60-day project usage is the sum of the project usage in the last 60 days plus the resources that the currently running jobs of the project would consume assuming that they run till their time limit. U60-U30 is the usage in the period between 60 days and 30 days ago.

 

Contingent C30: The 30-day contingent for the individual resources (CPU-Core-hours, GPU/FPGA-hours) is the total granted compute resources for 30 days, i.e. (total granted)/(project duration in days)*(30). If the project has not been running for 30 days yet or the remaining time is less than 30 days then the C30 is lowered accordingly.

Contingent C90: The 90-day contingent for the individual resources (CPU-Core-hours, GPU/FPGA-hours) is the sum of:

  • the C30,

  • half of max(0,(C30-(U60-U30))), i.e. half of the remaining C30 contingent from 30 days ago,

  • and half the C30, i.e. half of the future C30 contingent

Examples:

  • A project has started 5 days ago but hasn’t used any resources yet: C30=(total granted)/(project duration in days)*5, C90=C30+0.5*C30=1.5*C30

  • A project has used

    • has been running for more than 60 days and will run for more than 30 days in the future,

    • hasn’t used any resources in the last 30 days,

    • and has used half of the C30 in the time period between 30 days and 60 days ago,

    • then C90=C30+1/2*(C30-1/2*C30)+1/2*C30=2.25*C30

Priority

The priority is an integer number. The higher the better. You can view the priority of your pending jobs with squeue_pretty .

The priority is computed as:

Job_priority = 500.000 * (QoS_factor) ​

                       + 50.000 * (partition_factor) ​

                       + 35.000 * (age_factor) ​

                       + 15.000 * (job_size_factor)

The QoS_factor depends on the QoS-level of a job.

Quality-of-Service (QoS): qos_factor

The following QoS depend on the project usage. Only one of them is active for a project at a time.

QoS name

Usage

QoS- factor

QoS name

Usage

QoS- factor

cont

The project has used less that its C30 contingent in the last 30 days, i.e. U30<=C30.

0.6

lowcont

C30<U30<=C90

0.4

nocont

U30>C90 or total usage>granted contingent

0.2

suspended

project is not active, i.e. expired, locked or hasn’t started yet

0

 

There are also special QoS that can be chosen by the user for jobs:

QoS name

Usage

QoS limits

QoS- factor

QoS name

Usage

QoS limits

QoS- factor

test

for test projects

at most 2 runnning jobs per user and at most 2 submitted jobs per user

0.8

express

urgent jobs with high-priority

1000 CPU-Core-h per user and month and 30 GPU/FPGA-hours

0.8

fpgasynthesis

FPGA synthesis for FPGA projects

allowed partition on normal and largemem, only single-node jobs, at most 10 running jobs per user

0.8

eaccess

for employees of Paderborn University (see HPC Easy Access), no formal compute-time project needed

currently 500 CPU-Core-h plus 5 GPU-hours per user and month

0.6

Partition Factor: partition_factor

The partition_factor is 0 for most partitions. Only the fpga-partition has a factor of 1. This is done to allow more freedom in possible future overlapping of partitions.

Waiting Time: age_factor

The age_factor depends on the waiting time of a job. A job that was just submitted has an age_factor of 0. While the job is pending the age_factor grows linearly till it reaches a value of 1 after 10 days of waiting.

Job Size: job_size_factor

The job_size_factor prioritizes larger jobs. The value between 0 and 1 is given by requested resources divided by the total resources available in the cluster. A full-system job has a value of 1.

 

How to get a Higher Priority

Increasing the Project Contingent

If you either temporarily or permanently need a higher project contingent, then the PI of a project can request it from the Resource Allocation Board of PC2 via mail to pc2-support@uni-paderborn.de. Please include a justification for why you need the increase. Normal and large projects can be increased by at most 25 % over the initially granted total contingent.

express-Priority

The express-QoS (#SBATCH -q express) gives your jobs a very high priority. Each user has a monthly quota listed in the table above. If this quota is exceeded no jobs can be submitted with the express-priority until the quota is refreshed at the beninning of the next month.

You can only use the partitions that you could use via the regular priority in a compute-time project. Thus, if you don't have access to GPUs in a compute time project you also can’t access them via the express-priority.

fpgasynthesis-Priority (Noctua 2 only)

Projects that have been granted access to FPGAs also have access to the fpgasynthesis-QoS (#SBATCH -q fpgasynthesis). Using it gives their FPGA bitstream-synthesis jobs a high priority. The only allowed partitions are normal and largemem. Only single-node jobs are allowed and at most 10 running jobs per user.