Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2026/04/24 06:13] – [2026-03-30] diego.zuccato@unibo.itoph:cluster:messages [2026/06/11 09:47] (versione attuale) – [2026-06-11] è necessario richiedere le GPU perché il job possa usarle diego.zuccato@unibo.it
Linea 5: Linea 5:
 <WRAP center round info> <WRAP center round info>
 To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc. To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc.
 +</WRAP>
 +
 +<WRAP center round alert>
 +Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add:
 +  #SBATCH --exclude=bld[15-16]
 +to your job script.
 </WRAP> </WRAP>
  
 ===== 2026 ===== ===== 2026 =====
 +
 +==== 2026-06-11 ====
 +
 +Reconfiguring GPU nodes: when requesting a GPU node you have to *also* specify --gpus=N to have N GPUs assigned to your job. Other restrictions still apply, including allocation by socket (max 2 jobs per node).
 +
 +==== 2026-05-19 ====
 +
 +AC is now OK, the cluster have already been resumed.
 +
 +==== 2026-05-06 ====
 +
 +Started resuming some nodes. The biggest conditioner is still broken but the others have been fixed and are currently working. Hope not to have to shutdown again.
 +
 +==== 2026-05-05 ====
 +
 +The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP.
 +
  
 ==== 2026-03-30 ==== ==== 2026-03-30 ====
oph/cluster/messages.1777011197.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki