Strumenti Utente

Strumenti Sito


oph:cluster:resources

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:resources [2023/08/10 09:30] – "Forbid" use of nodenames and partitions, recommend constraints, other minor changes diego.zuccato@unibo.itoph:cluster:resources [2025/02/04 05:56] (versione attuale) – [General informations] diego.zuccato@unibo.it
Linea 3: Linea 3:
 The hardware structure of the DIFA-OPH computing cluster is summarised in the table below, listing all the compute nodes currently available with their individual hostnames, their specific resources (number of cores and available RAM), and their access policies. The hardware structure of the DIFA-OPH computing cluster is summarised in the table below, listing all the compute nodes currently available with their individual hostnames, their specific resources (number of cores and available RAM), and their access policies.
  
-In particular, blue nodes (with names ''bldNN''are part of the **BladeRunner island**, purple nodes (with names ''mtxNN'')  are part of the **Matrix island**, and orange nodes (with names ''gpuNN''are part of the **GPU island**. +In particular, bldNN nodes are part of the **BladeRunner island**, mtxNN nodes are part of the **Matrix island**, and gpuNN are part of the **GPU island**. 
  
-**The nodes associated with the OPH project are open to all users** while nodes associated with other individual projects may be subject to **access restrictions**. Such restrictions are indicated by different colors in the last column of the table+**The nodes associated with the OPH project are open to all users** while nodes associated with other individual projects may be subject to **access restrictions**. Such restrictions are indicated in the last column of the table.
- +
-  *** GREEN: Shared nodes**, the access is available for all users +
-  *** ORANGE: Shared nodes**, with regular weekly reservations for teaching purposes during which the nodes can be accessed only by students of specific DIFA courses for their laboratory activities +
-  *** RED: Reserved nodes**, the access is restricted to users explicitly authorised by the corresponding project PI for a given period of time +
- +
-{{ :oph:cluster:cluster_structure.001.jpeg?nolink |}}+
  
 +  * **Shared** nodes, the access is available for all users
 +  * **Teaching** nodes, reserved for teaching purposes: can be accessed only by students of specific DIFA courses for their laboratory activities
 +  *** Reserved nodes**, the access is restricted to users explicitly authorised by the corresponding project PI till the given **exp**iration date.
 +<WRAP width=100%>
 +^ Nodes      ^ vCPUs/RAM   ^ GPUs   ^ Project (PI)          ^ Access type (exp)  ^
 +| bld[01-02] | 24 / 64G    | -      | OPH                   | Shared             |
 +| bld[03-04] | 32 / 64G    | -      | OPH                   | Shared             |
 +| bld05      | 32 / 128G   | -      | OPH                   | Shared             |
 +| bld[15-16] | 16 / 24G    | -      | OPH                   | Teaching           |
 +| bld[17-18] | 32 / 64G    | -      | OPH                   | Shared             |
 +| mtx[00-15] | 56 / 256G   | -      | OPH                   | Shared             |
 +| mtx[16-19] | 112 / 512G  | -      | ERC-Astero (Miglio)   | <del>Reserved (2024-09)</del> Shared |
 +| mtx20      | 112 / 1T    | -      | OPH (Di Sabatino)     | Shared             |
 +| mtx[21-22] | 192 / 1T    | -      | SLIDE (Righi)         | Reserved (2025-10) |
 +| mtx[23-25] | 112 / 512G  | -      | OPH (Marinacci)       | Shared             |
 +| mtx26      | 112 / 512G  | -      | CAN (Bellini)         | Reserved (2026-04) |
 +| mtx27      | 112 / 512G  | -      | FFHiggsTop (Peraro)   | Reserved (2026-04) |
 +| mtx[28-29] | 112 / 1.5T  | -      | :::                   | :::                | 
 +| mtx30      | 64 / 1T     | -      | Trigger (Di Sabatino) | Reserved (2027-02) |  
 +| mtx[31-32] | 64 / 1T     | -      | EcoGal (Testi)        | Reserved (2027-08) |  
 +| mtx[33-34] | 192 / 1.5T  | -      | :::                   | :::                |
 +| mtx[35-36] | 192 / 512G  | -      | RED-CARDINAL (Belli)  | Reserved (2028-04) |  
 +| mtx[37-40] | 192 / 512G  | -      | ELSA (Talia)          | Reserved (2026-12) |
 +| gpu00      | 64 / 1T     | 2xA100 | VEO (Remondini)       | Reserved (2026-04) |
 +| gpu[01-02] | 112 / 1T    | 4xH100 | EcoGal (Testi)        | Reserved (2027-08) |  
 +| gpu03      | 112 / 1T    | 4xH100 | ELSA (Talia)          | Reserved (2026-12) |
 +</WRAP>
 ====== Computing Resources ====== ====== Computing Resources ======
  
Linea 18: Linea 39:
  
 Nodes are grouped by partitions. **DO NOT** specify neither nodenames nor partitions unless directed to do so by tech staff. Nodes are grouped by partitions. **DO NOT** specify neither nodenames nor partitions unless directed to do so by tech staff.
 +
 +===== Selecting nodes =====
  
 To select a (set of) node(s) suitable for your job, **use constraints**. These include: To select a (set of) node(s) suitable for your job, **use constraints**. These include:
Linea 31: Linea 54:
   * **gpu**: require a GPU-equipped node   * **gpu**: require a GPU-equipped node
  
-Some nodes are reserved for specific projects (see table above). To be able to use 'em you have to be explicitly allowed by project manager (= added to the project group via DSA interface). Once you're in the allowed group (check with ''id'') you can submit jobs specifying ''--reservation=prj-***''.+===== Reserved nodes ===== 
 + 
 +Some nodes are reserved for specific projects (see table above). To be able to use 'em you have to be explicitly allowed by project manager (= added to the project group via DSA interface). Once you're in the allowed group (check with ''id'') you can submit jobs specifying ''--reservation=prj-...''
 + 
 +^ Project     ^ Manager    ^ AD group (DSA)                 ^ OPH group (''id'' ^ Reservation to use ^ 
 +| CAN         | Bellini    | Str04109.13664-OPH-CAN         | OPH-res-CAN         | prj-can            | 
 +| ECOGAL      | Testi      | Str04109.13664-OPH-ECOGAL      | OPH-res-ECOGAL      | prj-ecogal         | 
 +| ELSA        | Talia      | Str04109.13664-OPH-ELSA        | OPH-res-ELSA        | prj-elsa           | 
 +| FFHiggsTop  | Peraro     | Str04109.13664-OPH-FFHiggsTop  | OPH-res-FFHiggsTop  | prj-ffhiggstop     | 
 +| RedCardinal | Belli      | Str04109.13664-OPH-RedCardinal | OPH-res-RedCardinal | prj-redcardinal    | 
 +| SLIDE       | Righi      | Str04109.13664-OPH-SLIDE       | OPH-res-SLIDE       | prj-slide          | 
 +| Trigger     | DiSabatino | Str04109.13664-OPH-Trigger     | OPH-res-Trigger     | prj-trigger        | 
 +| VEO         | Remondini  | Str04109.13664-OPH-VEO         | OPH-res-VEO         | prj-veo            | 
 + 
 +===== QualityOfService ===== 
 + 
 +What other clusters call "queue", Slurm calls QoS. 
 + 
 +By default all jobs are queued as "--qos=normal"
 + 
 +Each QoS offers different features: 
 + 
 +^ QoS    ^ Max runtime ^ Priority ^ Note ^ 
 +| normal |  24h        | standard | Default | 
 +| debug  |  15'        | high     | Max 2 nodes, 1 job per user, **not billed** | 
 +| long    72h        | low      | | 
 + 
 +====== Storage Resources ====== 
 + 
 +Storage is detailed [[oph:cluster:storage|on its own page]].
  
-^ Project    ^ Manager   ^ AD group (DSA)                ^ OPH group (''id'') ^ Reservation to use ^ +It's important to select the correct storage for the use you're going to do.
-| CAN        | Bellini   | Str04109.13664-OPH-CAN        | OPH-res-CAN        | prj-can | +
-| ERC_astero | Miglio    | Str04109.13664-OPH-erc_astero | OPH-res-erc_astero | prj-erc_astero | +
-| FFHiggsTop | Peraro    | Str04109.13664-OPH-FFHiggsTop | OPH-res-FFHiggsTop | prj-ffhiggstop | +
-| SLIDE      | Righi     | Str04109.13664-OPH-SLIDE      | OPH-res-SLIDE      | prj-slide | +
-| VEO        | Remondini | Str04109.13664-OPH-VEO        | OPH-res-VEO        | prj-veo |+
  
oph/cluster/resources.1691659853.txt.gz · Ultima modifica: 2023/08/10 09:30 da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki