The hardware structure of the DIFA-OPH computing cluster is summarised in the table below, listing all the compute nodes currently available with their individual hostnames, their specific resources (number of cores and available RAM), and their access policies.
In particular, bldNN nodes are part of the BladeRunner island, mtxNN nodes are part of the Matrix island, and gpuNN are part of the GPU island.
The nodes associated with the OPH project are open to all users while nodes associated with other individual projects may be subject to access restrictions. Such restrictions are indicated in the last column of the table.
Nodes | vCPUs/RAM | GPUs | Project (PI) | Access type (exp) |
---|---|---|---|---|
bld[01-02] | 24 / 64G | - | OPH | Shared |
bld[03-04] | 32 / 64G | - | OPH | Shared |
bld05 | 32 / 128G | - | OPH | Teaching |
bld[15-16] | 16 / 24G | - | OPH | Shared |
bld[17-18] | 32 / 64G | - | OPH | Shared |
mtx[00-15] | 56 / 256G | - | OPH | Shared |
mtx[16-19] | 112 / 512G | - | ERC-Astero (Miglio) | |
mtx20 | 112 / 1T | - | OPH (Di Sabatino) | Shared |
mtx[21-22] | 192 / 1T | - | SLIDE (Righi) | Reserved (2025-10) |
mtx[23-25] | 112 / 512G | - | OPH (Marinacci) | Shared |
mtx26 | 112 / 512G | - | CAN (Bellini) | Reserved (2026-04) |
mtx27 | 112 / 512G | - | FFHiggsTop (Peraro) | Reserved (2026-04) |
mtx[28-29] | 112 / 1.5T | - | ||
mtx30 | 64 / 1T | - | Trigger (Di Sabatino) | Reserved (2027-02) |
mtx[31-32] | 64 / 1T | - | EcoGal (Testi) | Reserved (2027-08) |
mtx[33-34] | 192 / 1.5T | - | ||
mtx[35-36] | 192 / 512G | - | RED-CARDINAL (Belli) | Reserved (2028-04) |
mtx[37-40] | 192 / 512G | - | ELSA (Talia) | Reserved (2026-12) |
gpu00 | 64 / 1T | 2xA100 | VEO (Remondini) | Reserved (2026-04) |
gpu[01-02] | 112 / 1T | 4xH100 | EcoGal (Testi) | Reserved (2027-08) |
gpu03 | 112 / 1T | 4xH100 | ELSA (Talia) | Reserved (2026-12) |
Resources are nodes, CPUs, GPUs1), RAM and time. You'll have to select the resources you need for the job. Do not overstimate too much or you'll be “billed” too much. But don't understimate or your job won't be able to complete. When a job completes, you receive a mail with seff output: this should help a lot to optimize future requests.
Nodes are grouped by partitions. DO NOT specify neither nodenames nor partitions unless directed to do so by tech staff.
To select a (set of) node(s) suitable for your job, use constraints. These include:
Some nodes are reserved for specific projects (see table above). To be able to use 'em you have to be explicitly allowed by project manager (= added to the project group via DSA interface). Once you're in the allowed group (check with id
) you can submit jobs specifying –reservation=prj-…
.
Project | Manager | AD group (DSA) | OPH group (id ) | Reservation to use |
---|---|---|---|---|
CAN | Bellini | Str04109.13664-OPH-CAN | OPH-res-CAN | prj-can |
ECOGAL | Testi | Str04109.13664-OPH-ECOGAL | OPH-res-ECOGAL | prj-ecogal |
ELSA | Talia | Str04109.13664-OPH-ELSA | OPH-res-ELSA | prj-elsa |
FFHiggsTop | Peraro | Str04109.13664-OPH-FFHiggsTop | OPH-res-FFHiggsTop | prj-ffhiggstop |
RedCardinal | Belli | Str04109.13664-OPH-RedCardinal | OPH-res-RedCardinal | prj-redcardinal |
SLIDE | Righi | Str04109.13664-OPH-SLIDE | OPH-res-SLIDE | prj-slide |
Trigger | DiSabatino | Str04109.13664-OPH-Trigger | OPH-res-Trigger | prj-trigger |
VEO | Remondini | Str04109.13664-OPH-VEO | OPH-res-VEO | prj-veo |
What other clusters call “queue”, Slurm calls QoS.
By default all jobs are queued as “–qos=normal”.
Each QoS offers different features:
QoS | Max runtime | Priority | Note |
---|---|---|---|
normal | 24h | standard | Default |
debug | 15' | high | Max 2 nodes, 1 job per user, not billed |
long | 72h | low |
Storage is detailed on its own page.
It's important to select the correct storage for the use you're going to do.