GPU Cluster Capella¶

Overview¶

The Lenovo multi-GPU cluster Capella has been installed by MEGWARE for AI-related computations and traditional HPC simulations. Capella is fully integrated into the ZIH HPC infrastructure. Therefore, the usage should be similar to the other clusters.

In November 2024, Capella was ranked #51 in the TOP500, which is #3 of German systems, and #5 in the GREEN500 lists of the world's fastest computers. Background information on how Capella reached these positions can be found in this Golem article.

Hardware Specifications¶

The hardware specification is documented on the page HPC Resources.

You use login[1-2].capella.hpc.tu-dresden.de to access the cluster Capella from the campus (or VPN) network. In order to verify the SSH fingerprints of the login nodes, please refer to the page Key Fingerprints.

On the login nodes you have access to the same filesystems and the software stack as on the compute node. GPUs are not available there.

In the subsections Filesystems and Software and Modules we provide further information on these two topics.

Filesystems¶

As with all other clusters, your /home directory is also available on Capella. For reasons of convenience, the filesystems horse and walrus are also accessible. Please note, that the filesystem horse should not be used as working filesystem at the cluster Capella because we have something better.

Cluster-Specific Filesystem `cat`¶

With Capella comes the new filesystem cat designed to meet the high I/O requirements of AI and ML workflows. It is a WEKAio filesystem and mounted under /data/cat. It is only available on the cluster Capella and the Datamover nodes.

The filesystem cat should be used as the main working filesystem and has to be used with workspaces. Workspaces on the filesystem cat can only be allocated on the login and compute nodes, not on the other clusters since cat is not available there.

cat has only limited capacity, hence workspace duration is significantly shorter than in other filesystems. We recommend that you only store actively used data there. To transfer input and result data from and to the filesystems horse and walrus, respectively, you will need to use the Datamover nodes. Regardless of the direction of transfer, you should pack your data into archives (,e.g., using dttar command) for the transfer.

Do not invoke data transfer to the filesystems horse and walrus from login nodes. Both login nodes are part of the cluster. Failures, reboots and other work might affect your data transfer resulting in data corruption.

All other share filesystems (/home, /software, /data/horse, /data/walrus, etc.) are also mounted.

Software and Modules¶

The most straightforward method for utilizing the software is through the well-known module system. All software available from the module system has been specifically build for the cluster Capella i.e., with optimization for Zen4 (Genoa) microarchitecture and CUDA-support enabled.

Python Virtual Environments¶

Virtual environments allow you to install additional Python packages and create an isolated runtime environment. We recommend using venv for this purpose.

Virtual environments in workspaces

We recommend to use workspaces for your virtual environments.

Batch System¶

The batch system Slurm may be used as usual. Please refer to the page Batch System Slurm for detailed information. In addition, the page Job Examples with GPU provides examples on GPU allocation with Slurm.

Slurm Limits and Job Runtime¶

Although, each compute node is equipped with 64 CPU cores in total, only a maximum of 56 can be requested via Slurm (cf. Slurm Resource Limits Table).

The maximum runtime of jobs and interactive sessions has been updated from initially 24 hours to 7 days. This and other settings might be adjusted over time for various reasons. Please check the table Slurm Resource Limits Table as well as the section QOS Resource Limits), which provide current settings.

Long running jobs

To allow for improved scheduling and better overall utilization of Capella, please make the jobs shorter if possible. You can use Chain Jobs to split a long running job exceeding the batch queues limits into parts and chain these parts. Applications with build-in check-point-restart functionality are very suitable for this approach! If your application provides check-point-restart, please use /data/cat for temporary data. Remove these data afterwards!

12 nodes with more RAM

In addition to the standard nodes with 768 GB of RAM, Capella offers 12 nodes with 1.5 TB of main memory (c.f. Slurm Resource Limits Table).

Slurm will schedule your job to the "normal" or "heavy" nodes according to the memory requirements of your job. You can specify the memory requirements of your job by selection either the --mem=<size> or --mem-per-gpu=<size> option.

Partition `capella-interactive`¶

The partition capella-interactive can be used for your small tests and compilation of software. In addition, JupyterHub instances that require low GPU utilization or only use GPUs for a short period of time in their allocation are intended to use this partition. You need to add #SBATCH --partition=capella-interactive to your job file and --partition=capella-interactive to your sbatch, srun and salloc command line, respectively, to address this partition. The partition capella-interactive is configured to use MIG configuration of 1/7.

Virtual GPUs-MIG¶

Starting with the Capella cluster, we introduce virtual GPUs. They are based on Nvidia's MIG technology. From an application point of view, each virtual GPU looks like a normal physical GPU, but offers only a fraction of the compute resources and the maximum allocatable memory on the device. We also only account you a fraction of a full GPU hour. By using virtual GPUs, we expect to improve overall system utilization for jobs that cannot take advantage of a full H100 GPU. In addition, we can provide you with more resources and therefore shorter waiting times. We intend to use these partitions for all applications that cannot use a full H100 GPU, such as Jupyter-Notebooks. Users can check the usage of compute and memory usage of the GPU with the help of job monitoring system PIKA. Since a GPU in the Capella cluster offers 3.2-3.5x more peak performance compared to an A100 GPU in the cluster Alpha Centauri, a 1/7 shard of a GPU in Capella is about half the performance of a GPU in Alpha Centauri.

At the moment we only have a partitioning of 7 in the capella-interactive partition, but we are free to create more configurations in the future. For this, users' demands and expected high utilization of the smaller GPUS are essential.

Configuration Name	Compute Resources	Memory in GiB	Accounted GPU hour
`capella-interactive`	1 / 7	11	1/7