Skip to content

SMP Cluster Julia

Overview

The HPE Superdome Flex is a large shared memory node. It is especially well suited for data intensive application scenarios, for example to process extremely large data sets completely in main memory or in very fast NVMe memory.

Hardware Details

A brief overview of the hardware specification is documented on the page HPC Resources.

Note

Julia has been partitioned at the end of October 2024. A quarter of the hardware resources (CPU and memory) are now in exclusive operation for the DZA. The numbers below represent the remaining available resources.

Processor
  • 24 x Intel Xeon Platinum 8276M (28 cores) @ 2.2 GHz
  • 24 NUMA domains, one on each socket
System Memory

A large shared memory of 36 TiB is the main feature of Julia. Keep in mind that Julia only logically presents one node and that memory from distant sockets has to be accessed via the network. Measured performance may therefore display strong discrepancies to ideal performance.

  • 24 sockets with 6 memory channels each
  • 2 x 128 GiB DDR4-2933 SECDED ECC-LRDIMMs per channel, 288 DIMMs total (Part No. M386AAG40MMB-CVF, product page)
  • Theoretical peak bandwidth across all sockets: 3378.8 GB/s
Local Storage

No node-local storage in the sense of a /tmp filesystem is available on Julia. Instead, there are 256 TiB of NVMe devices installed. The fast NVME-storage is available at /data/nvme/<projectname> for immediate access for all projects. A quota of 100 GB per project on this NVMe storage is set.

With a more detailed proposal to hpc-support@tu-dresden.de on how this unique system (large shared memory + NVMe storage) can speed up their computations, a project's quota can be increased or dedicated volumes of up to the full capacity can be set up.

Networking

Julia features six Mellanox ConnectX-6 InfiniBand devices (user manual) for networking. They operate at 4xEDR (100 Gb/s, 12.5 GB/s), providing the system with 600 Gb/s (75 GB/s) of network connectivity.

System Architecture

The system is based on HPE Superdome Flex servers.

Architecture output of hwloc-ls:

Architecture output of hwloc-ls

Operating System

Julia runs Rocky Linux version 8.9.

Hints for Usage

  • Granularity should be a socket (28 cores)
  • Can be used for OpenMP applications with large memory demands
  • To use Open MPI it is necessary to export the following environment variables, so that Open MPI uses shared-memory instead of InfiniBand for message transport:
export OMPI_MCA_pml=ob1
export OMPI_MCA_mtl=^mxm
  • Use I_MPI_FABRICS=shm so that Intel MPI doesn't even consider using InfiniBand devices itself, but only shared-memory instead

The following pages link to this page: