oph:cluster:jobs
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Prossima revisione | Revisione precedente | ||
oph:cluster:jobs [2023/04/06 15:08] – creata marco.baldi5@unibo.it | oph:cluster:jobs [2024/11/18 09:08] (versione attuale) – diego.zuccato@unibo.it | ||
---|---|---|---|
Linea 1: | Linea 1: | ||
- | ====== | + | ====== |
+ | |||
+ | The Frontend is the node you connect to remotely. Its primary function is to allow remote access | ||
+ | |||
+ | <WRAP center round important 60%> | ||
+ | If an executable must necessarily be tested on the Frontend, the responsible user must actively monitor the job and be sure that it is not active for more than a few seconds. | ||
+ | </ | ||
+ | |||
+ | That includes heavy IDEs((Integrated Development Environments)) (VScode, just to cite a name). If you're used to an IDE, use it on your client and just transfer the resulting files to the frontend. If it's worth using, it supports this workflow. | ||
+ | |||
+ | To better enforce the fair use of the frontend, the memory (RAM) usage is limited to 1GB per user. | ||
+ | |||
+ | |||
+ | ====== | ||
+ | |||
+ | To execute serial or parallel code, it is necessary to use the [[https:// | ||
+ | |||
+ | For each job, it is necessary to specify via a batch script the required resources (e.g. number of nodes, number of processors, memory, execution time) and, optionally, any other constraints (e.g. a group of nodes). Optionally, other parameters may also be indicated | ||
+ | |||
+ | |||
+ | ===== Submission via script ===== | ||
+ | |||
+ | Although it is possible to provide job submission information to the WorkLoad Manager via command line parameters, it is usually preferred to create a bash script (job script) that contains the information permanently. | ||
+ | |||
+ | The job script is ideally divided into three sections: | ||
+ | * The header, consisting of commented text in which information and notes useful to the user but ignored by the system are given (the syntax of the comments is ''# | ||
+ | * The Slurm settings, in which instructions for launching the actual job are specified (the syntax of the instructions is ''# | ||
+ | * The module loading and code execution, the structure of which varies according to the particular software each user is using. | ||
+ | |||
+ | Below is an example job script for parallel computing: | ||
+ | <code bash runParallel.sh> | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # Bologna | ||
+ | # | ||
+ | # | ||
+ | # License | ||
+ | # This is free software: you can redistribute it and/or modify it | ||
+ | # under the terms of the GNU General Public License as published by | ||
+ | # the Free Software Foundation, either version 3 of the License, or | ||
+ | # (at your option) any later version. | ||
+ | # | ||
+ | # Author | ||
+ | # Carlo Cintolesi | ||
+ | # | ||
+ | # Application | ||
+ | # slurm workload manager | ||
+ | # | ||
+ | # Usage | ||
+ | # run a job: | ||
+ | # check processes: | ||
+ | # | ||
+ | # | ||
+ | # Description | ||
+ | # Run job on the new cluster of OPH with SLURM | ||
+ | # | ||
+ | # --------------------------------------------------------------------------- # | ||
+ | # SLURM setup | ||
+ | # --------------------------------------------------------------------------- # | ||
+ | |||
+ | #- (1) [optional] Choose the account of your research group | ||
+ | ##SBATCH --account=oph | ||
+ | ##SBATCH --reservation=prj-can ## Use the node reserved for CAN project | ||
+ | ##SBATCH --qos=normal | ||
+ | ## and ' | ||
+ | |||
+ | #- (2) Select the subcluster partition to work on (optional), | ||
+ | # the number of tasks to be used (or specify the number of nodes and tasks), | ||
+ | # and the RAM memory available for each node | ||
+ | #- | ||
+ | #SBATCH --constraint=matrix | ||
+ | ##SBATCH --constraint=blade | ||
+ | #SBATCH --ntasks=56 | ||
+ | ##SBATCH --nodes=2 | ||
+ | ##SBATCH --tasks-per-node=28 ## number of tasks per node (multiple of 28) | ||
+ | #SBATCH --mem-per-cpu=2G | ||
+ | |||
+ | #- (3) Set the name of the job, the log and error files, | ||
+ | # define the email address for communications (just UniBo) | ||
+ | #- | ||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --output=%N_%j.out | ||
+ | #SBATCH --error=%N_%j.err | ||
+ | #SBATCH --mail-type=ALL | ||
+ | #SBATCH --mail-user=nome.cognome@unibo.it | ||
+ | |||
+ | # --------------------------------------------------------------------------- # | ||
+ | # Modules setup and applications run | ||
+ | # --------------------------------------------------------------------------- # | ||
+ | |||
+ | #- (4) Modules to be load | ||
+ | #- | ||
+ | module load mpi/ | ||
+ | |||
+ | #- (5) Run the job: just an example. | ||
+ | # Note that the number of processors "-np 56" must be equal to --ntasks=56 | ||
+ | #- | ||
+ | mpirun -np 56 ./ | ||
+ | |||
+ | # ------------------------------------------------------------------------end # | ||
+ | </ | ||
+ | |||
+ | It is possible to use several job steps (several lines that launch executables such as '' | ||
+ | |||
+ | To allocate the resource request in the job script by the WorkLoad Manager, the command must be executed: | ||
+ | |||
+ | sbatch --time hh:mm:ss runParallel.sh [other parameters] | ||
+ | |||
+ | <WRAP center round info> | ||
+ | Estimating the value to use for '' | ||
+ | </ | ||
+ | <WRAP center round tip> | ||
+ | '' | ||
+ | |||
+ | While '' | ||
+ | </ | ||
+ | |||
+ | For the management of running jobs, please refer to section "Job Management" | ||
+ | |||
+ | |||
+ | ===== ' | ||
+ | |||
+ | Sometimes you have to run some heavy tasks (unsuitable for the frontend) that require interactivity. For example to compile a complex program that requires you to answer some questions, or to create a container. | ||
+ | |||
+ | You have to first request a node allocation, either by sbatch (as above, possibly with ' | ||
+ | salloc -N 1 --cpus-per-task=... --time=... --mem=... --constraint=blade | ||
+ | salloc will pause while waiting for the requested resources, so be prepared. It also tells you the value for $JOBID to be used in the following steps. | ||
+ | |||
+ | Then you can connect your terminal to the running job via: | ||
+ | srun --pty --overlap --jobid $JOBID bash | ||
+ | that gives you a new shell on the first allocated node for $JOBID (just like SSH-ing a node with the resources you asked for). | ||
+ | |||
+ | Once you're done, remember to call: | ||
+ | scancel $JOBID | ||
+ | to release resources for other users. | ||
+ | |||
+ | ===== Job Management ===== | ||
+ | |||
+ | Once a job has been sent to the WorkLoad Manager via the command '' | ||
+ | |||
+ | * ''/ | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | Other management functions for the job and the accounting issue include the following: | ||
+ | |||
+ | * ''/ | ||
+ | * '' |
oph/cluster/jobs.1680793738.txt.gz · Ultima modifica: 2023/04/06 15:08 da marco.baldi5@unibo.it