Strumenti Utente

Strumenti Sito


oph:cluster:messages

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:messages [2025/08/22 12:16] – [2025-08-22] work started diego.zuccato@unibo.itoph:cluster:messages [2026/01/12 12:05] (versione attuale) – [2026-01-12] diego.zuccato@unibo.it
Linea 7: Linea 7:
 </WRAP> </WRAP>
  
 +===== 2026 =====
 +
 +==== 2026-01-12 ====
 +  * /archive returned writable from frontends and bld18
 +
 +==== 2026-01-07 ====
 +  * Recovered (partially) from the emergency shutdown on XMas. /archive is currently readonly and it's possible not everything is accessible -- we're investigating the issue
  
 ===== 2025 ===== ===== 2025 =====
 +
 +==== 2025-12-25 ====
 +  * Emergency shutdown: data center temperature too high (>50°C) could cause long-lasting issues
 +
 +==== 2025-12-19 ====
 +  * Maintenance shutdown. The cluster will be unavailable till 2025-12-23 (if all goes well).
 +
 +==== 2025-12-02 ====
 +  * Users from STUDENTI aren't able to reach the cluster. We're trying to determine the cause. It will be fixed ASAP.
 +
 +==== 2025-10-15 ====
 +
 +  * **/scratch is full**: please delete unneeded data, and archive the rest; if it does't get better in a couple of days, we'll have to run the enforcement script that deletes older data -- **IT'S NOT POSSIBLE TO RECOVERY DATA DELETED BY THE SCRIPT!** See (again) [[oph:cluster:storage#scratch|the storage page]] for more info.
 +
 +==== 2025-08-25 ====
 +  * All nodes except bld17 (that was already down) should be fully operational, but it's not possible for us to check for dataset coherence: please **check your datasets/results**, expecially the ones you were working on at the time of the blackout; **double-check** (or discard and recreate) the ones you were writing to!
  
 ==== 2025-08-22 ==== ==== 2025-08-22 ====
   * Started power-on. Some disks are corrupt and require some work to be recovered.   * Started power-on. Some disks are corrupt and require some work to be recovered.
 +  * [22:42 GMT+1] Cluster is *mostly* operational, the downed nodes will be fixed in the next days
 ==== 2025-08-21 ==== ==== 2025-08-21 ====
   * Power line is still unreliable: deferring poweron till tomorrow morning   * Power line is still unreliable: deferring poweron till tomorrow morning
oph/cluster/messages.1755865006.txt.gz · Ultima modifica: da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki