oph:cluster:messages
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
oph:cluster:messages [2023/10/20 09:31] – Reopened temp rw diego.zuccato@unibo.it | oph:cluster:messages [2025/03/31 12:59] (versione attuale) – [2025-03-31] diego.zuccato@unibo.it | ||
---|---|---|---|
Linea 3: | Linea 3: | ||
Newer messages at the top. | Newer messages at the top. | ||
- | === 2023-10-20 === | + | <WRAP center round info> |
+ | To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn' | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===== 2025 ===== | ||
+ | |||
+ | ==== 2025-03-31 ==== | ||
+ | * < | ||
+ | |||
+ | ==== 2025-03-27 ==== | ||
+ | |||
+ | * < | ||
+ | |||
+ | ==== 2025-01-13 ==== | ||
+ | |||
+ | * /archive is now **read-only** to avoid potential data loss during cluster move | ||
+ | |||
+ | ===== 2024 ===== | ||
+ | |||
+ | ==== 2024-12-18 ==== | ||
+ | |||
+ | * Power sources are redundant again. There shouldn' | ||
+ | |||
+ | ==== 2024-12-17 ==== | ||
+ | |||
+ | * Possibly unstable power source: rotative unit failed yesterday and is being worked on by CNAF. Some nodes powered off and have been restored. Uptime is currently not guaranteed in case of blackout. | ||
+ | |||
+ | ==== 2024-11-04 ==== | ||
+ | |||
+ | * Started cleanup of /home/work : everything outside sectors folders is being moved inside the (hopefully) correct folder: might bring the sector folder over-quota -- just clean up (delete unneeded, move to /scratch or / | ||
+ | |||
+ | ==== 2024-10-30 ==== | ||
+ | |||
+ | * < | ||
+ | * mtx18 detected a problem with a DIMM module: powered off for HW checkup | ||
+ | * /home/temp is being decommissioned: | ||
+ | |||
+ | ==== 2024-10-22 ==== | ||
+ | |||
+ | * :!: <wrap hi> | ||
+ | |||
+ | ==== 2024-10-17 ==== | ||
+ | |||
+ | * Started transfer from /home/temp to /scratch of all the remaining files: / | ||
+ | * <wrap hi> | ||
+ | |||
+ | ==== 2024-10-03 ==== | ||
+ | |||
+ | * test e-mail | ||
+ | |||
+ | ==== 2024-09-16 ==== | ||
+ | |||
+ | * Completed data copy from / | ||
+ | * /archive is now available readwrite from frontends and <wrap hi> | ||
+ | * remember that quota on /archive is both about data size (20TB per sector) and inodes (max 10k files/dirs per sector)< | ||
+ | A "disk full" error means that one of the two limits have been reached. | ||
+ | </ | ||
+ | * the data size can be checked by using 'ls -lh' on the parent directory, number of files requires a '' | ||
+ | |||
+ | ==== 2024-09-10 ==== | ||
+ | |||
+ | * /scratch should now be stable (< | ||
+ | * / | ||
+ | |||
+ | ==== 2024-09-09 ==== | ||
+ | |||
+ | * /scratch is temporarily unavailable: | ||
+ | |||
+ | ==== 2024-08-19 ==== | ||
+ | |||
+ | * /home/temp reactivated: | ||
+ | * still testing /archive, it should become available soon (aiming to have it ready this week if no new issues arise) | ||
+ | |||
+ | ==== 2024-06-10 ==== | ||
+ | |||
+ | * new nodes for ECOGAL, RED-CARDINAL and ELSA projects are now available (mtx[33-40]), | ||
+ | * ophfe3 is now available for regular use | ||
+ | |||
+ | ==== 2024-06-10 ==== | ||
+ | |||
+ | * mtx03 is now OK | ||
+ | * all data from old /scratch should now be available under / | ||
+ | |||
+ | ==== 2024-06-03 ==== | ||
+ | |||
+ | * switched /scratch from GlusterFS to BeeGFS (SSD-backed): | ||
+ | * can now be used for jobs as a (faster) replacement for / | ||
+ | * data from old /scratch is now under / | ||
+ | * **do not write** under / | ||
+ | * auto-deletion in not active (yet) | ||
+ | * mtx03 is having hardware issues and will be down for some time | ||
+ | |||
+ | ==== 2024-05-28 ==== | ||
+ | * ophfe3 is currently reserved for file transfer, login is not allowed: you can use ophfe1 and ophfe2 | ||
+ | * <WRAP round important> | ||
+ | * data is currently being copied from /scratch to the new temporary storage that will be under / | ||
+ | * planned maintenance should be completed in 2h from the start, so the cluster will be useable on 2024-06-03 at 10am with new mounts | ||
+ | * this maintenance only affects /scratch area: /home is not touched (but the sooner you migrate your data from / | ||
+ | * if you don't find something from current / | ||
+ | * /archive (the current /scratch) will be **empty** for some time, then it will undergo its own migration to a different architecture and / | ||
+ | |||
+ | ==== 2024-05-24 ==== | ||
+ | |||
+ | * New filesystem layout is being **__tested__ on OPHFE3**. What was /scratch is now mounted on /archive and the new (fast) scratch area is mounted in /scratch | ||
+ | <WRAP center round important 60%> | ||
+ | Please do not use /scratch on ophfe3 or consider your data **gone**! | ||
+ | </ | ||
+ | |||
+ | ==== 2024-05-06 ==== | ||
+ | |||
+ | * /home/temp temporarily unavailable: | ||
+ | |||
+ | ==== 2024-04-16 ==== | ||
+ | |||
+ | * ophfe2 is now reinstalled on new HW; please test it and report issues (if any) | ||
+ | |||
+ | ==== 2024-04-09 ==== | ||
+ | |||
+ | * <WRAP important> | ||
+ | Direct login to frontend nodes have been disabled, [[oph: | ||
+ | </ | ||
+ | * ophfe2 is going to be reinstalled: | ||
+ | |||
+ | ==== 2024-04-05 ==== | ||
+ | |||
+ | * Deployed new authorization config: please promptly report eventual slowdowns or other problems to difa.csi@unibo.it . | ||
+ | * Bastion (137.204.50.15) is already working and direct access to ophfe* nodes is being phased out. Usually you only need to add "-J name.surname@137.204.50.15" | ||
+ | |||
+ | ==== 2024-03-12 ==== | ||
+ | |||
+ | * **New frontend available**: | ||
+ | * Frontend at 137.204.50.177 is now **deprecated** and will be removed soon, to be replaced by a newer one at 137.204.50.72 | ||
+ | |||
+ | ==== 2024-02-21 ==== | ||
+ | |||
+ | * 11.50 Outage resolved. | ||
+ | * 06:30 Cluster operation is currently stopped due to a slurmctld error (daemon is not listening to network connections). I'm working to try to resolve the outage ASAP. | ||
+ | |||
+ | ===== 2023 ===== | ||
+ | |||
+ | ==== 2023-11-10 ==== | ||
+ | |||
+ | <wrap alert> | ||
+ | ** Old (pre-august) __backups__ of data in /scratch will be deleted on 2023-11-27 . ** | ||
+ | |||
+ | If you have very important data, verify before the deadline. | ||
+ | </ | ||
+ | |||
+ | ==== 2023-10-20 | ||
Tentatively re-enabled read/write mode for /home/temp . **Archive and delete** old data before starting new writes! | Tentatively re-enabled read/write mode for /home/temp . **Archive and delete** old data before starting new writes! | ||
- | === 2023-10-13 === | + | ==== 2023-10-13 |
/home/temp filesystem is currently offline for technical issues. Trying to reactivate **readonly** access ASAP. | /home/temp filesystem is currently offline for technical issues. Trying to reactivate **readonly** access ASAP. | ||
Linea 15: | Linea 164: | ||
filesystem will be **wiped** soon. | filesystem will be **wiped** soon. | ||
</ | </ | ||
- | === 2023-08-10 === | + | |
+ | ==== 2023-08-10 | ||
New login node available: ophfe3 (137.204.50.73) is now usable. | New login node available: ophfe3 (137.204.50.73) is now usable. | ||
slurmtop is now in path, so no need to specify / | slurmtop is now in path, so no need to specify / | ||
- | === 2023-08-01 === | + | ==== 2023-08-01 |
:!: VSCode is bringing login node to a halt. Use it on your client and transfer the files. | :!: VSCode is bringing login node to a halt. Use it on your client and transfer the files. | ||
- | === older === | + | ===== older (undated) ===== |
/scratch is now available, but readonly and only from the login nodes. | /scratch is now available, but readonly and only from the login nodes. |
oph/cluster/messages.1697794288.txt.gz · Ultima modifica: 2023/10/20 09:31 da diego.zuccato@unibo.it