oph:cluster:messages
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
oph:cluster:messages [2024/10/22 05:17] – [2024-10-22] diego.zuccato@unibo.it | oph:cluster:messages [2025/03/31 12:59] (versione attuale) – [2025-03-31] diego.zuccato@unibo.it | ||
---|---|---|---|
Linea 3: | Linea 3: | ||
Newer messages at the top. | Newer messages at the top. | ||
- | === 2024-10-22 === | + | <WRAP center round info> |
+ | To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn' | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===== 2025 ===== | ||
+ | |||
+ | ==== 2025-03-31 ==== | ||
+ | * < | ||
+ | |||
+ | ==== 2025-03-27 ==== | ||
+ | |||
+ | * < | ||
+ | |||
+ | ==== 2025-01-13 ==== | ||
+ | |||
+ | * /archive is now **read-only** to avoid potential data loss during cluster move | ||
+ | |||
+ | ===== 2024 ===== | ||
+ | |||
+ | ==== 2024-12-18 ==== | ||
+ | |||
+ | * Power sources are redundant again. There shouldn' | ||
+ | |||
+ | ==== 2024-12-17 ==== | ||
+ | |||
+ | * Possibly unstable power source: rotative unit failed yesterday and is being worked on by CNAF. Some nodes powered off and have been restored. Uptime is currently not guaranteed in case of blackout. | ||
+ | |||
+ | ==== 2024-11-04 ==== | ||
+ | |||
+ | * Started cleanup of /home/work : everything outside sectors folders is being moved inside the (hopefully) correct folder: might bring the sector folder over-quota -- just clean up (delete unneeded, move to /scratch or / | ||
+ | |||
+ | ==== 2024-10-30 ==== | ||
+ | |||
+ | * < | ||
+ | * mtx18 detected a problem with a DIMM module: powered off for HW checkup | ||
+ | * /home/temp is being decommissioned: | ||
+ | |||
+ | ==== 2024-10-22 | ||
* :!: <wrap hi> | * :!: <wrap hi> | ||
- | === 2024-10-17 === | + | ==== 2024-10-17 |
* Started transfer from /home/temp to /scratch of all the remaining files: / | * Started transfer from /home/temp to /scratch of all the remaining files: / | ||
* <wrap hi> | * <wrap hi> | ||
- | === 2024-10-03 === | + | ==== 2024-10-03 |
* test e-mail | * test e-mail | ||
- | === 2024-09-16 === | + | ==== 2024-09-16 ==== |
* Completed data copy from / | * Completed data copy from / | ||
* /archive is now available readwrite from frontends and <wrap hi> | * /archive is now available readwrite from frontends and <wrap hi> | ||
Linea 24: | Linea 63: | ||
* the data size can be checked by using 'ls -lh' on the parent directory, number of files requires a '' | * the data size can be checked by using 'ls -lh' on the parent directory, number of files requires a '' | ||
- | + | ==== 2024-09-10 | |
- | === 2024-09-10 === | + | |
* /scratch should now be stable (< | * /scratch should now be stable (< | ||
* / | * / | ||
- | === 2024-09-09 === | + | ==== 2024-09-09 |
* /scratch is temporarily unavailable: | * /scratch is temporarily unavailable: | ||
| | ||
- | === 2024-08-19 === | + | ==== 2024-08-19 ==== |
* /home/temp reactivated: | * /home/temp reactivated: | ||
* still testing /archive, it should become available soon (aiming to have it ready this week if no new issues arise) | * still testing /archive, it should become available soon (aiming to have it ready this week if no new issues arise) | ||
- | === 2024-06-10 === | + | ==== 2024-06-10 ==== |
* new nodes for ECOGAL, RED-CARDINAL and ELSA projects are now available (mtx[33-40]), | * new nodes for ECOGAL, RED-CARDINAL and ELSA projects are now available (mtx[33-40]), | ||
* ophfe3 is now available for regular use | * ophfe3 is now available for regular use | ||
- | === 2024-06-10 === | + | |
+ | ==== 2024-06-10 ==== | ||
* mtx03 is now OK | * mtx03 is now OK | ||
* all data from old /scratch should now be available under / | * all data from old /scratch should now be available under / | ||
- | === 2024-06-03 === | + | ==== 2024-06-03 ==== |
* switched /scratch from GlusterFS to BeeGFS (SSD-backed): | * switched /scratch from GlusterFS to BeeGFS (SSD-backed): | ||
* can now be used for jobs as a (faster) replacement for /home/temp | * can now be used for jobs as a (faster) replacement for /home/temp | ||
Linea 53: | Linea 96: | ||
* mtx03 is having hardware issues and will be down for some time | * mtx03 is having hardware issues and will be down for some time | ||
- | === 2024-05-28 === | + | ==== 2024-05-28 |
* ophfe3 is currently reserved for file transfer, login is not allowed: you can use ophfe1 and ophfe2 | * ophfe3 is currently reserved for file transfer, login is not allowed: you can use ophfe1 and ophfe2 | ||
* <WRAP round important> | * <WRAP round important> | ||
Linea 62: | Linea 105: | ||
* /archive (the current /scratch) will be **empty** for some time, then it will undergo its own migration to a different architecture and / | * /archive (the current /scratch) will be **empty** for some time, then it will undergo its own migration to a different architecture and / | ||
- | === 2024-05-24 === | + | ==== 2024-05-24 ==== |
* New filesystem layout is being **__tested__ on OPHFE3**. What was /scratch is now mounted on /archive and the new (fast) scratch area is mounted in /scratch | * New filesystem layout is being **__tested__ on OPHFE3**. What was /scratch is now mounted on /archive and the new (fast) scratch area is mounted in /scratch | ||
<WRAP center round important 60%> | <WRAP center round important 60%> | ||
Linea 68: | Linea 112: | ||
</ | </ | ||
+ | ==== 2024-05-06 ==== | ||
- | === 2024-05-06 === | ||
* /home/temp temporarily unavailable: | * /home/temp temporarily unavailable: | ||
- | === 2024-04-16 === | + | ==== 2024-04-16 ==== |
* ophfe2 is now reinstalled on new HW; please test it and report issues (if any) | * ophfe2 is now reinstalled on new HW; please test it and report issues (if any) | ||
- | === 2024-04-09 === | + | ==== 2024-04-09 ==== |
* <WRAP important> | * <WRAP important> | ||
Direct login to frontend nodes have been disabled, [[oph: | Direct login to frontend nodes have been disabled, [[oph: | ||
</ | </ | ||
* ophfe2 is going to be reinstalled: | * ophfe2 is going to be reinstalled: | ||
- | === 2024-04-05 === | + | |
+ | ==== 2024-04-05 ==== | ||
* Deployed new authorization config: please promptly report eventual slowdowns or other problems to difa.csi@unibo.it . | * Deployed new authorization config: please promptly report eventual slowdowns or other problems to difa.csi@unibo.it . | ||
* Bastion (137.204.50.15) is already working and direct access to ophfe* nodes is being phased out. Usually you only need to add "-J name.surname@137.204.50.15" | * Bastion (137.204.50.15) is already working and direct access to ophfe* nodes is being phased out. Usually you only need to add "-J name.surname@137.204.50.15" | ||
- | === 2024-03-12 === | + | ==== 2024-03-12 |
* **New frontend available**: | * **New frontend available**: | ||
* Frontend at 137.204.50.177 is now **deprecated** and will be removed soon, to be replaced by a newer one at 137.204.50.72 | * Frontend at 137.204.50.177 is now **deprecated** and will be removed soon, to be replaced by a newer one at 137.204.50.72 | ||
- | === 2024-02-21 === | + | ==== 2024-02-21 |
* 11.50 Outage resolved. | * 11.50 Outage resolved. | ||
* 06:30 Cluster operation is currently stopped due to a slurmctld error (daemon is not listening to network connections). I'm working to try to resolve the outage ASAP. | * 06:30 Cluster operation is currently stopped due to a slurmctld error (daemon is not listening to network connections). I'm working to try to resolve the outage ASAP. | ||
+ | ===== 2023 ===== | ||
- | === 2023-11-10 === | + | ==== 2023-11-10 |
<wrap alert> | <wrap alert> | ||
Linea 103: | Linea 152: | ||
</ | </ | ||
- | === 2023-10-20 === | + | ==== 2023-10-20 |
Tentatively re-enabled read/write mode for /home/temp . **Archive and delete** old data before starting new writes! | Tentatively re-enabled read/write mode for /home/temp . **Archive and delete** old data before starting new writes! | ||
- | === 2023-10-13 === | + | ==== 2023-10-13 |
/home/temp filesystem is currently offline for technical issues. Trying to reactivate **readonly** access ASAP. | /home/temp filesystem is currently offline for technical issues. Trying to reactivate **readonly** access ASAP. | ||
Linea 115: | Linea 164: | ||
filesystem will be **wiped** soon. | filesystem will be **wiped** soon. | ||
</ | </ | ||
- | === 2023-08-10 === | + | |
+ | ==== 2023-08-10 | ||
New login node available: ophfe3 (137.204.50.73) is now usable. | New login node available: ophfe3 (137.204.50.73) is now usable. | ||
slurmtop is now in path, so no need to specify / | slurmtop is now in path, so no need to specify / | ||
- | === 2023-08-01 === | + | ==== 2023-08-01 |
:!: VSCode is bringing login node to a halt. Use it on your client and transfer the files. | :!: VSCode is bringing login node to a halt. Use it on your client and transfer the files. | ||
- | === older === | + | ===== older (undated) ===== |
/scratch is now available, but readonly and only from the login nodes. | /scratch is now available, but readonly and only from the login nodes. |
oph/cluster/messages.1729574221.txt.gz · Ultima modifica: 2024/10/22 05:17 da diego.zuccato@unibo.it