Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2025/12/03 05:35] – [2025-12-02] diego.zuccato@unibo.itoph:cluster:messages [2026/05/05 07:50] (versione attuale) – Avviso sovratemperatura diego.zuccato@unibo.it
Linea 7: Linea 7:
 </WRAP> </WRAP>
  
 +<WRAP center round alert>
 +Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add:
 +  #SBATCH --exclude=bld[15-16]
 +to your job script.
 +</WRAP>
 +
 +===== 2026 =====
 +
 +==== 2026-04-05 ====
 +
 +<WRAP center round important>
 +The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP.
 +</WRAP>
 +
 +
 +==== 2026-03-30 ====
 +Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12.
 +
 +In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible.
 +
 +==== 2026-03-23 ====
 +Planned network interruption around 06.00. Network should remain unreachable for about 5 minutes, but the AD integration (the one that lets you authenticate against bastion-nav and ophfe*) could have issues restarting afterwards and will be checked at about 07.45.
 +
 +Active connections and file transfers will be dropped. Running jobs should continue working unless they're using remote resources.
 +
 +==== 2026-03-10 ====
 +<del>Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...).
 +</del> No unplanned shutdown this time
 +
 +==== 2026-01-12 ====
 +  * /archive returned writable from frontends and bld18
 +
 +==== 2026-01-07 ====
 +  * Recovered (partially) from the emergency shutdown on XMas. /archive is currently readonly and it's possible not everything is accessible -- we're investigating the issue
  
 ===== 2025 ===== ===== 2025 =====
 +
 +==== 2025-12-25 ====
 +  * Emergency shutdown: data center temperature too high (>50°C) could cause long-lasting issues
 +
 +==== 2025-12-19 ====
 +  * Maintenance shutdown. The cluster will be unavailable till 2025-12-23 (if all goes well).
  
 ==== 2025-12-02 ==== ==== 2025-12-02 ====
oph/cluster/messages.1764740101.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki