oph:cluster:messages
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
| oph:cluster:messages [2026/03/10 06:26] – [2026-03-10] diego.zuccato@unibo.it | oph:cluster:messages [2026/05/06 12:51] (versione attuale) – [2026-04-05] Partial resume diego.zuccato@unibo.it | ||
|---|---|---|---|
| Linea 5: | Linea 5: | ||
| <WRAP center round info> | <WRAP center round info> | ||
| To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn' | To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn' | ||
| + | </ | ||
| + | |||
| + | <WRAP center round alert> | ||
| + | Remember that bld[15-16] are reserved for courses during the day. Jobs launched while not in a lab lesson will be terminated without further notice. If you need to run jobs to prepare an exam, just add: | ||
| + | #SBATCH --exclude=bld[15-16] | ||
| + | to your job script. | ||
| </ | </ | ||
| ===== 2026 ===== | ===== 2026 ===== | ||
| - | ==== 2026-03-10 ==== | + | ==== 2026-04-06 ==== |
| - | <WRAP center round important> | + | |
| - | Possible problems due to electrical maintenance. Cluster might poweroff without further warning, even if we've been reassured that won't happen (yeah... sure... hope it's not just like the last time...). | + | |
| - | </WRAP> | + | Started resuming some nodes. The biggest conditioner is still broken but the others have been fixed and are currently working. Hope not to have to shutdown again. |
| + | |||
| + | ==== 2026-04-05 ==== | ||
| + | |||
| + | The server room is experiencing overtemperature due to a failed AC: many (not all) nodes are being drained and will be resumed ASAP. | ||
| + | |||
| + | |||
| + | ==== 2026-03-30 ==== | ||
| + | Possible (hopefully unlikely) service interruption due to removal of electrical bypass installed on 25/12. | ||
| + | |||
| + | In case of emergency, the cluster will be shut down without further notice between 08.00 and 09.00 and reopened as soon as possible. | ||
| + | |||
| + | ==== 2026-03-23 ==== | ||
| + | Planned network interruption around 06.00. Network should remain unreachable for about 5 minutes, but the AD integration (the one that lets you authenticate against bastion-nav and ophfe*) could have issues restarting afterwards and will be checked at about 07.45. | ||
| + | |||
| + | Active connections and file transfers will be dropped. Running jobs should continue working unless they' | ||
| + | |||
| + | ==== 2026-03-10 ==== | ||
| + | < | ||
| + | </del> No unplanned shutdown this time | ||
| ==== 2026-01-12 ==== | ==== 2026-01-12 ==== | ||
oph/cluster/messages.1773123966.txt.gz · Ultima modifica: da diego.zuccato@unibo.it
