Slurm Resource Limits¶

There is no such thing as free lunch at ZIH systems. Since compute nodes are operated in multi-user node by default, jobs of several users can run at the same time at the very same node sharing resources, like memory (but not CPU). On the other hand, a higher throughput can be achieved by smaller jobs. Thus, restrictions w.r.t. Memory Limits, Runtime Limits and GPU-based Limits have to be respected when submitting jobs.

Runtime Limits¶

Runtime limits on login nodes

There is a time limit of 600 seconds set for processes on login nodes. Each process running longer than this time limit is automatically killed. The login nodes are shared resources between all users of ZIH system and thus, need to be available and cannot be used for productive runs.

CPU time limit exceeded

Please submit extensive application runs to the compute nodes using the batch system.

Runtime limits are enforced.

A job is canceled as soon as it exceeds its requested limit. Currently, the maximum runtime limit is 7 days. Please find the current runtime limits for all clusters below in the Slurm Resource Limits Table.

Shorter jobs come with multiple advantages:

lower risk of loss of computing time,
shorter waiting time for scheduling,
higher job fluctuation; thus, jobs with high priorities may start faster.

To bring down the percentage of long running jobs we restrict the number of cores with jobs longer than 2 days to approximately 50% and with jobs longer than 24 to 75% of the total number of cores. (These numbers are subject to change.) As best practice we advise a runtime of about 8h.

Please always try to make a good estimation of your needed time limit.

For this, you can use a command line like this to compare the requested time limit with the elapsed time for your completed jobs that started after a given date:

marie@login$ sacct -X -S 2021-01-01 -E now --format=start,JobID,jobname,elapsed,timelimit -s COMPLETED

Instead of running one long job, you should split it up into a chain job. Even applications that are not capable of checkpoint/restart can be adapted. Please refer to the section Checkpoint/Restart for further documentation.

Memory Limits¶

Memory limits are enforced.

Jobs which exceed their per-node memory limit are killed automatically by the batch system.

Memory requirements for your job can be specified via the sbatch/srun parameters:

--mem-per-cpu=<MB> or --mem=<MB> (which is "memory per node"). The default limit regardless of the partition it runs on is quite low at 300 MB per CPU. If you need more memory, you need to request it.

ZIH systems comprise different sets of nodes with different amount of installed memory which affect where your job may be run. To achieve the shortest possible waiting time for your jobs, you should be aware of the limits shown in the Slurm resource limits table.

GPU-Based Limits¶

When you're running jobs on the GPU nodes of Alpha Centauri or Capella, CPU cores and memory are limited to ensure sufficient resources for GPU jobs. You can specify the amount of memory and CPUs you need, up to a maximum that scales with the number of GPUs you request. Review the following limits to avoid errors or resource limit violations in your Slurm job scripts.

Rules on Alpha Centauri and Capella

You need to explictly submit the number of nodes (via -N, --nodes=<N>)
You can request
- any number of CPUs (via --cpus-per-task, --cpus-per-gpu, or other options),
- any amount of memory (via --mem or --mem-per-gpu)

... but Slurm will enforce a cap depending on how many GPUs you ask for. If you go over one of the defined limits, your job will be rejected. The limits are depicted in the following table.

Alpha CentauriCapella

When requesting resources on Alpha Centauri per GPU, you can specify up to 6 CPUs and 123750 MB of memory, resulting in these limits:

Requested GPUs	Max CPUs	Max Memory [MB]
1	6	123750
2	12	247500
3	18	371250
4	24	495000
5	30	618750
6	36	742500
7	40	866250
8	46	990000

For the interactive partition (alpha-interactive) you can request only 1 GPU, 1 CPU and 20 GB of RAM.

When requesting resources on Capella per GPU, you can specify up to 14 CPUs and 193250 MB of memory, resulting in these limits:

Requested GPUs	Max CPUs	Max Memory [MB]
1	14	193250
2	28	386500
3	42	579750
4	56	773000

For the interactive partition (capella-interactive) you can request only 1 GPU, 1 CPU and 20 GB of RAM.

Always calculate your memory and CPU needs based on your actual application requirements, but make sure you're within the GPU-based limits.

Error messages

Any job that exceeds the above limits will be rejected with a corresponding error message, e.g.

marie@login.alpha$ srun --nodes=1 --gres=gpu:1 --tasks=1 --cpus-per-task=7 --test-only
srun: lua: Non-exclusive jobs on Alpha Centauri are allowed to request at most 6 CPU cores per requested GPU to allow a fair sharing of resources within a single node.
srun: error: cli_filter plugin terminated with error

Slurm Resource Limits Table¶

The physical installed memory might differ from the amount available for Slurm jobs. These limits are generally smaller than the physically installed sizes. One reason are so-called diskless compute nodes, i.e., nodes without additional local drives. At these nodes, the operating system and other components reside in the main memory, lowering the available memory for jobs. The reserved amount of memory for the system operation might vary slightly over time. The following table depicts the resource limits for all our HPC systems.

HPC System	Partition	# Nodes	Cores per Node	Threads per Core	Memory per Node [in MB]	Memory per (SMT) Core [in MB]	GPUs per Node	Cores per GPU	Job Max Time [in days]
`Capella`	capella	142	56	1	752,520	13,437	4	14	7
`Capella`	capella	12	56	1	1,526,520	27,259	4	14	7
`Capella`	capella-interactive	2	32	1	452,520	14,141	28	1	12:00:00
`Barnard`	barnard	630	104	2	500,000	2,403	-	-	7
`Barnard`	barnard	90	104	2	985,000	4,735	-	-	7
`Alpha`	alpha	35	48	2	990,000	10,312	8	6	7
`Alpha`	alpha-interactive	2	16	2	330,000	10,312	16	1	12:00:00
`Julia`	julia	1	644	1	35,050,743	54,426	-	-	7
`Romeo`	romeo	185	128	2	505,000	1,972	-	-	7

Alpha Centauri, Barnard and Romeo (systems with multiple threads per core) have Simultaneous Multithreading (SMT) enabled. You can request these additional threads using the Slurm option --hint=multithread or by setting the environment variable SLURM_HINT=multithread. Besides the usage of threads to speed up computations, the memory of the other threads is allocated implicitly too, and you will always get Memory per Core*number of threads as a memory pledge.

QOS Resource Limits¶

In addition to the physical limits of the installed hardware, the GPU clusters Alpha Centauri and Capella have additional resource limits applied to every Slurm job via the normal QOS (Quality of Service). The purpose of these limits is to prevent users from using too many GPUs for too long time. The use of QOS limits is intended to provide a fair distribution of computing time for all users.

You can list these limits for QOS normal on the GPU-clusters Alpha Centauri and Capella using the following sacctmgr show qos command line:

Alpha CentauriCapella

marie@login.alpha$ sacctmgr show qos normal format=name,flags,maxwall,MaxTRESPU%-40
      Name                Flags     MaxWall MaxTRESPU
---------- -------------------- ----------- ----------------------------------------
    normal          DenyOnLimit  7-00:00:00 cpu=1184,gres/gpu=98,mem=12210000M

marie@login.capella$ sacctmgr show qos normal format=name,flags,maxwall,MaxTRESPU%-40
      Name                Flags     MaxWall MaxTRESPU
---------- -------------------- ----------- ----------------------------------------
    normal          DenyOnLimit  7-00:00:00 cpu=3976,gres/gpu=284,mem=54883000M

The settings meaning are:

MaxWall: Maximum time limit that can be requested for a job.
MaxTRESPU: Maximum number of CPUs, GPUs and memory an user can request in the QOS normal.
DenyOnLimit: Jobs will be rejected at submission time if they do not conform with the limits.

Backlinks¶

The following pages link to this page: