Strumenti Utente

Strumenti Sito


oph:cluster:jobs

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
oph:cluster:jobs [2023/08/10 23:54] carlo.cintolesi@unibo.itoph:cluster:jobs [2024/11/18 09:08] (versione attuale) diego.zuccato@unibo.it
Linea 10: Linea 10:
  
 To better enforce the fair use of the frontend, the memory (RAM) usage is limited to 1GB per user. To better enforce the fair use of the frontend, the memory (RAM) usage is limited to 1GB per user.
 +
 +
 ====== Run a Job ====== ====== Run a Job ======
  
Linea 22: Linea 24:
  
 The job script is ideally divided into three sections: The job script is ideally divided into three sections:
-  * The header, consisting of commented text in which information and notes useful to the user but ignored by the system are given (the syntax of the comments is #text-for-user...);  +  * The header, consisting of commented text in which information and notes useful to the user but ignored by the system are given (the syntax of the comments is ''#text-for-user...'');  
-  * The Slurm settings, in which instructions for launching the actual job are specified (the syntax of the instructions is #SLURM --option);+  * The Slurm settings, in which instructions for launching the actual job are specified (the syntax of the instructions is ''#SLURM --option'');
   * The module loading and code execution, the structure of which varies according to the particular software each user is using.   * The module loading and code execution, the structure of which varies according to the particular software each user is using.
  
Linea 105: Linea 107:
 To allocate the resource request in the job script by the WorkLoad Manager, the command must be executed: To allocate the resource request in the job script by the WorkLoad Manager, the command must be executed:
  
-  sbatch runParallel.sh [other parameters]+  sbatch --time hh:mm:ss runParallel.sh [other parameters] 
 + 
 +<WRAP center round info> 
 +Estimating the value to use for ''--time'' is possibly the hardest part of the request. Please **do not** always use the maximum allowed time. Using a shorter estimate usually means your job will run before others that are requesting the maximum (backfill scheduling). 
 +</WRAP> 
 +<WRAP center round tip> 
 +''--nodes'' can also be a range. 
 + 
 +While ''--nodes=2 --ntasks=56'' **always** asks for 2 nodes even if the job would run on a single 112-vCPUs node (leading to longer queue times), ''--nodes=1-4 --ntasks=56'' would happily use the bigger node, if available, or up to 4 half-nodes from mtx[00-15]. 
 +</WRAP>
  
 For the management of running jobs, please refer to section "Job Management". For the management of running jobs, please refer to section "Job Management".
 +
 +
 +===== 'Interactive' jobs =====
 +
 +Sometimes you have to run some heavy tasks (unsuitable for the frontend) that require interactivity. For example to compile a complex program that requires you to answer some questions, or to create a container.
 +
 +You have to first request a node allocation, either by sbatch (as above, possibly with 'dummy' payload, like a ''sleep 7200'' for a 2h duration) or by:
 +  salloc -N 1 --cpus-per-task=... --time=... --mem=... --constraint=blade
 +salloc will pause while waiting for the requested resources, so be prepared. It also tells you the value for $JOBID to be used in the following steps.
 +
 +Then you can connect your terminal to the running job via:
 +  srun --pty --overlap --jobid $JOBID bash
 +that gives you a new shell on the first allocated node for $JOBID (just like SSH-ing a node with the resources you asked for).
 +
 +Once you're done, remember to call:
 +  scancel $JOBID
 +to release resources for other users.
  
 ===== Job Management ===== ===== Job Management =====
oph/cluster/jobs.1691711695.txt.gz · Ultima modifica: 2023/08/10 23:54 da carlo.cintolesi@unibo.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki