====== Login messages ======
Newer messages at the top.
To report issues, please write **only** to difa.csi@unibo.it including a clear description of the problem ("my job doesn't work" is __not__ a clear description but is what we usually receive...), including jobID, misbehaving node(s), steps to reproduce, etc.
===== 2024 =====
==== 2024-11-04 ====
* Started cleanup of /home/work : everything outside sectors folders is being moved inside the (hopefully) correct folder: might bring the sector folder over-quota -- just clean up (delete unneeded, move to /scratch or /archive)!
==== 2024-10-30 ====
* Slurm is experiencing an unexpected misbehaviour and does not accept job sumbissions: we're already working on it Reverted a problematic change, more tests required
* mtx18 detected a problem with a DIMM module: powered off for HW checkup
* /home/temp is being decommissioned: data have already been moved to /scratch
==== 2024-10-22 ====
* :!: Tomorrow nodes mtx[30-40] will be temporarily shut down for maintenance; jobs running on those nodes will be aborted without further notice
==== 2024-10-17 ====
* Started transfer from /home/temp to /scratch of all the remaining files: /home/temp/sector/name.surname -> /scratch/sector/name.surname/name.surname (note the doubling of name.surname to avoid clashes with existing files)
* automatic deletion from /scratch is being worked on: all files older than **40 days** will be deleted automatically very soon, archive what you ned to keep! There will be **__no way to recover deleted files__**!
==== 2024-10-03 ====
* test e-mail
==== 2024-09-16 ====
* Completed data copy from /scratch/archive to /archive
* /archive is now available readwrite from frontends and readonly from nodes
* remember that quota on /archive is both about data size (20TB per sector) and inodes (max 10k files/dirs per sector)
A "disk full" error means that one of the two limits have been reached.
* the data size can be checked by using 'ls -lh' on the parent directory, number of files requires a ''find path/to/dir | wc -l'' (:!: slow :!:)
==== 2024-09-10 ====
* /scratch should now be stable (hopefully) UPDATE: NOPE, only partially accessible
* /scratch/archive is now offlined to allow migration to new storage
==== 2024-09-09 ====
* /scratch is temporarily unavailable: we are working to restore the access. //UPDATE//: the issue has been fixed now so /scratch is usable again.
==== 2024-08-19 ====
* /home/temp reactivated: it crashes when overfilled! => stop using /home/temp and migrate your data to /scratch ASAP
* still testing /archive, it should become available soon (aiming to have it ready this week if no new issues arise)
==== 2024-06-10 ====
* new nodes for ECOGAL, RED-CARDINAL and ELSA projects are now available (mtx[33-40]), but they are currently without InfiniBand connection
* ophfe3 is now available for regular use
==== 2024-06-10 ====
* mtx03 is now OK
* all data from old /scratch should now be available under /scratch**/archive**, including many files that were unavailable or even deleted: check for duplicates/unneeded files and clean up! Currently there's no quota, but **quota will be enforced when moving data to /archive**!
==== 2024-06-03 ====
* switched /scratch from GlusterFS to BeeGFS (SSD-backed):
* can now be used for jobs as a (faster) replacement for /home/temp
* data from old /scratch is now under /scratch/archive (transfer is still proceeding)
* **do not write** under /scratch/archive
* auto-deletion in not active (yet)
* mtx03 is having hardware issues and will be down for some time
==== 2024-05-28 ====
* ophfe3 is currently reserved for file transfer, login is not allowed: you can use ophfe1 and ophfe2
* Planned maintenance starting on 2024-06-03T08:00:00 : details in following items
* data is currently being copied from /scratch to the new temporary storage that will be under /scratch/archive
* planned maintenance should be completed in 2h from the start, so the cluster will be useable on 2024-06-03 at 10am with new mounts
* this maintenance only affects /scratch area: /home is not touched (but the sooner you migrate your data from /home/temp/DDD to /scratch/DDD the sooner your jobs gets sped up)
* if you don't find something from current /scratch/DDD in new /scratch/archive/DDD you can write to difa.csi@unibo.it to ask for an extra transfer (sync would happen anyway, if you can wait some days)
* /archive (the current /scratch) will be **empty** for some time, then it will undergo its own migration to a different architecture and /scratch/archive will be moved there
==== 2024-05-24 ====
* New filesystem layout is being **__tested__ on OPHFE3**. What was /scratch is now mounted on /archive and the new (fast) scratch area is mounted in /scratch
Please do not use /scratch on ophfe3 or consider your data **gone**!
==== 2024-05-06 ====
* /home/temp temporarily unavailable: overfilled (again) and crashed. Now under maintenance. Added another 4TB but **it will crash again when overfilled** :!:
==== 2024-04-16 ====
* ophfe2 is now reinstalled on new HW; please test it and report issues (if any)
==== 2024-04-09 ====
*
Direct login to frontend nodes have been disabled, [[oph:cluster:access#step_1connecting_to_the_cluster|use Bastion service]]; nothing changed for connections from internal network (wired or AlmaWifi)
* ophfe2 is going to be reinstalled: please leave it free ASAP; once reinstalled it will change its IP address from 50.177 to 50.72 .
==== 2024-04-05 ====
* Deployed new authorization config: please promptly report eventual slowdowns or other problems to difa.csi@unibo.it .
* Bastion (137.204.50.15) is already working and direct access to ophfe* nodes is being phased out. Usually you only need to add "-J name.surname@137.204.50.15" to the ssh command you're using.
==== 2024-03-12 ====
* **New frontend available**: a new frontend can be reached at 137.204.50.71
* Frontend at 137.204.50.177 is now **deprecated** and will be removed soon, to be replaced by a newer one at 137.204.50.72
==== 2024-02-21 ====
* 11.50 Outage resolved.
* 06:30 Cluster operation is currently stopped due to a slurmctld error (daemon is not listening to network connections). I'm working to try to resolve the outage ASAP.
===== 2023 =====
==== 2023-11-10 ====
** Old (pre-august) __backups__ of data in /scratch will be deleted on 2023-11-27 . **
If you have very important data, verify before the deadline.
==== 2023-10-20 ====
Tentatively re-enabled read/write mode for /home/temp . **Archive and delete** old data before starting new writes!
==== 2023-10-13 ====
/home/temp filesystem is currently offline for technical issues. Trying to reactivate **readonly** access ASAP.
**11:05 UPDATE:** /home temp is now available in readonly mode. Please **[[oph:cluster:archiver|archive]]** the data you need to keep, the
filesystem will be **wiped** soon.
==== 2023-08-10 ====
New login node available: ophfe3 (137.204.50.73) is now usable.
slurmtop is now in path, so no need to specify /home/software/utils/ .
==== 2023-08-01 ====
:!: VSCode is bringing login node to a halt. Use it on your client and transfer the files.
===== older (undated) =====
/scratch is now available, but readonly and only from the login nodes.
**Verify the archived data.** Usually you'll find a file.tar for every directory.
Associated to that file you can also find
* //file-extra.tar// with files that were deleted only from one replica (they prevented removing "empty" folders"). **Probably** you can safely delete these.
* //file-extra2.tar// contains files that are (most **likely**, but not always) dupes of the ones in file.tar
Backups will be deleted before 2023-12-20 (precise date TBD), so be sure to **__verify your data ASAP__**: once the backups are deleted there will be **no way to recover**.