Strumenti Utente

Strumenti Sito


oph:cluster:messages

Questa è una vecchia versione del documento!


Login messages

Newer messages at the top.

2024-06-03

  • switched /scratch from GlusterFS to BeeGFS (SSD-backed):
    • can now be used for jobs as a (faster) replacement for /home/temp
    • data from old /scratch is now under /scratch/archive (transfer is still proceeding)
    • do not write* under /scratch/archive * auto-deletion in not active (yet) * mtx03 is having hardware issues and will be down for some time === 2024-05-28 === * ophfe3 is currently reserved for file transfer, login is not allowed: you can use ophfe1 and ophfe2 *

      Planned maintenance starting on 2024-06-03T08:00:00 : details in following items

      * data is currently being copied from /scratch to the new temporary storage that will be under /scratch/archive * planned maintenance should be completed in 2h from the start, so the cluster will be useable on 2024-06-03 at 10am with new mounts * this maintenance only affects /scratch area: /home is not touched (but the sooner you migrate your data from /home/temp/DDD to /scratch/DDD the sooner your jobs gets sped up) * if you don't find something from current /scratch/DDD in new /scratch/archive/DDD you can write to difa.csi@unibo.it to ask for an extra transfer (sync would happen anyway, if you can wait some days) * /archive (the current /scratch) will be empty for some time, then it will undergo its own migration to a different architecture and /scratch/archive will be moved there === 2024-05-24 === * New filesystem layout is being tested on OPHFE3. What was /scratch is now mounted on /archive and the new (fast) scratch area is mounted in /scratch

      Please do not use /scratch on ophfe3 or consider your data gone!

      === 2024-05-06 === * /home/temp temporarily unavailable: overfilled (again) and crashed. Now under maintenance. Added another 4TB but it will crash again when overfilled :!: === 2024-04-16 === * ophfe2 is now reinstalled on new HW; please test it and report issues (if any) === 2024-04-09 === *

      Direct login to frontend nodes have been disabled, use Bastion service; nothing changed for connections from internal network (wired or AlmaWifi)

      * ophfe2 is going to be reinstalled: please leave it free ASAP; once reinstalled it will change its IP address from 50.177 to 50.72 . === 2024-04-05 === * Deployed new authorization config: please promptly report eventual slowdowns or other problems to difa.csi@unibo.it . * Bastion (137.204.50.15) is already working and direct access to ophfe* nodes is being phased out. Usually you only need to add “-J name.surname@137.204.50.15” to the ssh command you're using. === 2024-03-12 === * New frontend available: a new frontend can be reached at 137.204.50.71 * Frontend at 137.204.50.177 is now deprecated and will be removed soon, to be replaced by a newer one at 137.204.50.72 === 2024-02-21 === * 11.50 Outage resolved. * 06:30 Cluster operation is currently stopped due to a slurmctld error (daemon is not listening to network connections). I'm working to try to resolve the outage ASAP. === 2023-11-10 === Old (pre-august) backups of data in /scratch will be deleted on 2023-11-27 . If you have very important data, verify before the deadline. === 2023-10-20 === Tentatively re-enabled read/write mode for /home/temp . Archive and delete old data before starting new writes! === 2023-10-13 === /home/temp filesystem is currently offline for technical issues. Trying to reactivate readonly access ASAP. 11:05 UPDATE: /home temp is now available in readonly mode. Please archive the data you need to keep, the filesystem will be wiped soon. === 2023-08-10 === New login node available: ophfe3 (137.204.50.73) is now usable. slurmtop is now in path, so no need to specify /home/software/utils/ . === 2023-08-01 === :!: VSCode is bringing login node to a halt. Use it on your client and transfer the files. === older === /scratch is now available, but readonly and only from the login nodes. Verify the archived data. Usually you'll find a file.tar for every directory. Associated to that file you can also find * file-extra.tar with files that were deleted only from one replica (they prevented removing “empty” folders“). Probably you can safely delete these. * file-extra2.tar contains files that are (most likely, but not always) dupes of the ones in file.tar Backups will be deleted before 2023-12-20 (precise date TBD), so be sure to verify your data ASAP: once the backups are deleted there will be no way to recover**.

oph/cluster/messages.1717413786.txt.gz · Ultima modifica: 2024/06/03 11:23 da diego.zuccato@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki