Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2026/03/23 14:30] – [2026-03-23] (ripristinato avviso cancellato per errore) diego.zuccato@unibo.itoph:cluster:messages [2026/06/11 09:47] (versione attuale) – [2026-06-11] è necessario richiedere le GPU perché il job possa usarle diego.zuccato@unibo.it
Linea 5: Linea 5:
 <WRAP center round info> <WRAP center round info>
 To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc. To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc.
 +</WRAP>
 +
 +<WRAP center round alert>
 +Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add:
 +  #SBATCH --exclude=bld[15-16]
 +to your job script.
 </WRAP> </WRAP>
  
 ===== 2026 ===== ===== 2026 =====
 +
 +==== 2026-06-11 ====
 +
 +Reconfiguring GPU nodes: when requesting a GPU node you have to *also* specify --gpus=N to have N GPUs assigned to your job. Other restrictions still apply, including allocation by socket (max 2 jobs per node).
 +
 +==== 2026-05-19 ====
 +
 +AC is now OK, the cluster have already been resumed.
 +
 +==== 2026-05-06 ====
 +
 +Started resuming some nodes. The biggest conditioner is still broken but the others have been fixed and are currently working. Hope not to have to shutdown again.
 +
 +==== 2026-05-05 ====
 +
 +The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP.
 +
  
 ==== 2026-03-30 ==== ==== 2026-03-30 ====
-<WRAP center round important> 
 Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12. Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12.
-</WRAP> 
  
 In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible. In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible.
oph/cluster/messages.1774276218.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki