Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2026/03/10 06:26] – [2026-03-10] diego.zuccato@unibo.itoph:cluster:messages [2026/05/06 12:51] (versione attuale) – [2026-04-05] Partial resume diego.zuccato@unibo.it
Linea 5: Linea 5:
 <WRAP center round info> <WRAP center round info>
 To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc. To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc.
 +</WRAP>
 +
 +<WRAP center round alert>
 +Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add:
 +  #SBATCH --exclude=bld[15-16]
 +to your job script.
 </WRAP> </WRAP>
  
 ===== 2026 ===== ===== 2026 =====
  
-==== 2026-03-10 ==== +==== 2026-04-06 ====
-<WRAP center round important> +
-Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...).+
  
-</WRAP>+Started resuming some nodes. The biggest conditioner is still broken but the others have been fixed and are currently working. Hope not to have to shutdown again. 
 + 
 +==== 2026-04-05 ==== 
 + 
 +The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP. 
 + 
 + 
 +==== 2026-03-30 ==== 
 +Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12. 
 + 
 +In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible. 
 + 
 +==== 2026-03-23 ==== 
 +Planned network interruption around 06.00. Network should remain unreachable for about 5 minutes, but the AD integration (the one that lets you authenticate against bastion-nav and ophfe*) could have issues restarting afterwards and will be checked at about 07.45. 
 + 
 +Active connections and file transfers will be dropped. Running jobs should continue working unless they're using remote resources. 
 + 
 +==== 2026-03-10 ==== 
 +<del>Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...). 
 +</delNo unplanned shutdown this time
  
 ==== 2026-01-12 ==== ==== 2026-01-12 ====
oph/cluster/messages.1773123966.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki