Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2026/03/23 14:28] – [2026-03-30] Electrical maintenance (planned intervention) diego.zuccato@unibo.itoph:cluster:messages [2026/05/05 07:50] (versione attuale) – Avviso sovratemperatura diego.zuccato@unibo.it
Linea 5: Linea 5:
 <WRAP center round info> <WRAP center round info>
 To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc. To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc.
 +</WRAP>
 +
 +<WRAP center round alert>
 +Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add:
 +  #SBATCH --exclude=bld[15-16]
 +to your job script.
 </WRAP> </WRAP>
  
 ===== 2026 ===== ===== 2026 =====
  
-==== 2026-03-30 ====+==== 2026-04-05 ==== 
 <WRAP center round important> <WRAP center round important>
-Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12.+The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP.
 </WRAP> </WRAP>
 +
 +
 +==== 2026-03-30 ====
 +Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12.
  
 In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible. In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible.
 +
 +==== 2026-03-23 ====
 +Planned network interruption around 06.00. Network should remain unreachable for about 5 minutes, but the AD integration (the one that lets you authenticate against bastion-nav and ophfe*) could have issues restarting afterwards and will be checked at about 07.45.
 +
 +Active connections and file transfers will be dropped. Running jobs should continue working unless they're using remote resources.
 +
 ==== 2026-03-10 ==== ==== 2026-03-10 ====
 <del>Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...). <del>Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...).
oph/cluster/messages.1774276084.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki