Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- oph:cluster:jobs [2023/07/24 13:41] – carlo.cintolesi@unibo.it
+++ oph:cluster:jobs [2024/11/18 09:08] (versione attuale) – diego.zuccato@unibo.it
@@ Linea 7: / Linea 7: @@
 </WRAP>
-That includes heavy IDEs((Integrated Developement Environments)) (VScode, just to cite a name). If you're used to an IDE, use it on your client and just transfer the resulting files to the frontend. If it's worth using, it supports this workflow.
+That includes heavy IDEs((Integrated Development Environments)) (VScode, just to cite a name). If you're used to an IDE, use it on your client and just transfer the resulting files to the frontend. If it's worth using, it supports this workflow.
 To better enforce the fair use of the frontend, the memory (RAM) usage is limited to 1GB per user.
 ====== Run a Job ======
 To execute serial or parallel code, it is necessary to use the [[https://slurm.schedmd.com/documentation.html|Slurm WorkLoad Manager]], which will allocate the necessary resources and manage the priority of requests. Below are some of the basic functions and operating instructions for submitting serial and parallel execution (job) via Slurm; please refer to the official documentation for further information.
-For each job, it is necessary to specify via a batch script the required resources (e.g. number of nodes, number of processors, memory, execution time) and, optionally, any other constraints (e.g. group of nodes). Optionally, other parameters may also be indicated
+For each job, it is necessary to specify via a batch script the required resources (e.g. number of nodes, number of processors, memory, execution time) and, optionally, any other constraints (e.g. a group of nodes). Optionally, other parameters may also be indicated
 ===== Submission via script =====
-Although it is possible to provide job submission information to the WorkLoad Manager via command line parameters, it is normally preferred to create a bash script (job script) that contains the information permanently.
+Although it is possible to provide job submission information to the WorkLoad Manager via command line parameters, it is usually preferred to create a bash script (job script) that contains the information permanently.
 The job script is ideally divided into three sections:
-  * The header, consisting of commented text in which information and notes useful to the user but ignored by the system are given (the syntax of the comments is #text-for-user...);
+  * The header, consisting of commented text in which information and notes useful to the user but ignored by the system are given (the syntax of the comments is ''#text-for-user...'');
-  * The Slurm settings, in which instructions for launching the actual job are specified (the syntax of the instructions is #SLURM --option);
+  * The Slurm settings, in which instructions for launching the actual job are specified (the syntax of the instructions is ''#SLURM --option'');
   * The module loading and code execution, the structure of which varies according to the particular software each user is using.
-Below is an example job script ([[https://liveunibo.sharepoint.com/:u:/s/HPCClusterbetatesters/EecW_OOm3_NCle2VM3lZtZgBUSj6IbAxR_Hmh0Faf3quCQ?e=whnblQ|runParallel.sh]]) for parallel computing:
+Below is an example job script for parallel computing:
+<code bash runParallel.sh>
+#!/bin/bash
+#---------------------------------------------------------------------------- #
+#   University    |   DIFA - Dept of Physics and Astrophysics
+#       of        |   Open Physics Hub
+#    Bologna      |   (https://site.unibo.it/openphysicshub/en)
+#----------------------------------------------------------------------------
+#
+# License
+#    This is free software: you can redistribute it and/or modify it
+#    under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, either version 3 of the License, or
+#    (at your option) any later version.
+#
+# Author
+#   Carlo Cintolesi
+#
+# Application
+#   slurm workload manager
+#
+# Usage
+#   run a job:         sbatch run.sh
+#   check processes:   slurmtop
+#   delete a job:      scancel <jobID>
+#
+# Description
+#   Run job on the new cluster of OPH with SLURM
+#
+# --------------------------------------------------------------------------- #
+# SLURM setup
+# --------------------------------------------------------------------------- #
-  #!/bin/bash
+#- (1) [optional] Choose the account of your research group
-  #---------------------------------------------------------------------------- #
+##SBATCH --account=oph         ## This job must be "billed" to OPH project
-  #   University    |   DIFA - Dept of Physics and Astrophysics
+##SBATCH --reservation=prj-can ## Use the node reserved for CAN project
-  #       of        |   Open Physics Hub
+##SBATCH --qos=normal          ## Also available 'debug' (max 15', no billing)
-  #    Bologna      |   (https://site.unibo.it/openphysicshub/en)
+                               ## and 'long' (max 72h, low priority)
-  #----------------------------------------------------------------------------
-  #
+#- (2) Select the subcluster partition to work on (optional),
-  # License
+#  the number of tasks to be used (or specify the number of nodes and tasks),
-  #    This is free software: you can redistribute it and/or modify it
+#  and the RAM memory available for each node
-  #    under the terms of the GNU General Public License as published by
+#-
-  #    the Free Software Foundation, either version 3 of the License, or
+#SBATCH --constraint=matrix  ## run on matrix subcluster (parallel computing)
-  #    (at your option) any later version.
+##SBATCH --constraint=blade  ## run on blade subcluster (pre/post-processing)
-  #
+#SBATCH --ntasks=56          ## total number of tasks
-  # Author
+##SBATCH --nodes=2           ## number of nodes to be allocated
-  #   Carlo Cintolesi
+##SBATCH --tasks-per-node=28 ## number of tasks per node (multiple of 28)
-  #
+#SBATCH --mem-per-cpu=2G     ## ram per cpu (to be tuned)
-  # Application
-  #   slurm workload manager
+#- (3) Set the name of the job, the log and error files,
-  #
+#  define the email address for communications (just UniBo)
-  # Usage
+#-
-  #   run a job:         sbatch run.sh
+#SBATCH --job-name="jobName" ## job name in the scheduler
-  #   check processes:   slurmtop
+#SBATCH --output=%N_%j.out   ## log file
-  #   delete a job:      scancel <jobID>
+#SBATCH --error=%N_%j.err    ## err file
-  #
+#SBATCH --mail-type=ALL      ## send a message when the job start and end
-  # Description
+#SBATCH --mail-user=nome.cognome@unibo.it  ## email address for messages
-  #   Run job on the new cluster of OPH with SLURM
-  #
+# --------------------------------------------------------------------------- #
-  # --------------------------------------------------------------------------- #
+# Modules setup and applications run
-  # SLURM setup
+# --------------------------------------------------------------------------- #
-  # --------------------------------------------------------------------------- #
+#- (4) Modules to be load
+#-
+module load mpi/openmpi/4.1.4
+#- (5) Run the job: just an example.
+#  Note that the number of processors "-np 56" must be equal to --ntasks=56
+#-
+mpirun -np 56 ./executable <params>
-  #- (1) Choose the partition where launch the job,
+# ------------------------------------------------------------------------end #
-  #  and the account of your research group
+</code>
-  #-
-  ##SBATCH --partition=blade      ## Blade - post/pre-processing
-  #SBATCH --partition=matrix      ## Matrix - HPC jobs
-  #- (2) Select the nodes to work on (discouraged in Matrix),
-  #  the number of tasks to be used (or specify the number of node and tasks),
-  #  the Infiniband constraint (encouraged in Matrix)
-  #  the RAM memory available for each node
-  #-
-  #SBATCH --constraint=ib      ## infiniband, keep for all matrix node
-  #SBATCH --nodes=2            ## number of nodes to be allocated
-  #SBATCH --tasks-per-node=28  ## number of tasks per node
-  #SBATCH --ntasks=56          ## total number of tasks (should be compatible with nodes x tasks-per-node)
-  #SBATCH --mem-per-cpu=2G     ## ram per cpu (to be tuned)
-  #- (3) Set the name of the job, the log and error files,
-  #  define the email address for comunications (just UniBo)
-  #-
-  #SBATCH --job-name="jobName" ## job name in the scheduler
-  #SBATCH --output=%N_%j.out   ## log file
-  #SBATCH --error=%N_%j.err    ## err file
-  #SBATCH --mail-type=ALL      ## send email at beginning and end of job
-  #SBATCH --mail-user=nome.cognome@unibo.it  ## email to send job information to
-  # --------------------------------------------------------------------------- #
-  # Modules setup and applications run
-  # --------------------------------------------------------------------------- #
-  #- (4) Modules to be load
-  #- ADD MODULES YOU NEED
-  module load mpi/openmpi/4.1.4
-  #- (5) Run the job: just an example
-  #-
-  mpirun -np 56 ./executable <params>
-  # ------------------------------------------------------------------------end #
 It is possible to use several job steps (several lines that launch executables such as ''mpirun'') in a single job script if each step requires the same resource allocation as the previous one and must start when the previous one has finished. If, on the other hand, the steps are independent or sequentially dependent on different resource requests, then it is better to use separate job scripts: the execution of the job steps takes place sequentially within a single resource allocation (e.g. in a single subset of nodes), while different jobs can have different allocations (thus reducing resource wastage) and also start in parallel.
@@ Linea 103: / Linea 107: @@
 To allocate the resource request in the job script by the WorkLoad Manager, the command must be executed:
-  sbatch runParallel.sh [other parameters]
+  sbatch --time hh:mm:ss runParallel.sh [other parameters]
+<WRAP center round info>
+Estimating the value to use for ''--time'' is possibly the hardest part of the request. Please **do not** always use the maximum allowed time. Using a shorter estimate usually means your job will run before others that are requesting the maximum (backfill scheduling).
+</WRAP>
+<WRAP center round tip>
+''--nodes'' can also be a range.
+While ''--nodes=2 --ntasks=56'' **always** asks for 2 nodes even if the job would run on a single 112-vCPUs node (leading to longer queue times), ''--nodes=1-4 --ntasks=56'' would happily use the bigger node, if available, or up to 4 half-nodes from mtx[00-15].
+</WRAP>
 For the management of running jobs, please refer to section "Job Management".
+===== 'Interactive' jobs =====
+Sometimes you have to run some heavy tasks (unsuitable for the frontend) that require interactivity. For example to compile a complex program that requires you to answer some questions, or to create a container.
+You have to first request a node allocation, either by sbatch (as above, possibly with 'dummy' payload, like a ''sleep 7200'' for a 2h duration) or by:
+  salloc -N 1 --cpus-per-task=... --time=... --mem=... --constraint=blade
+salloc will pause while waiting for the requested resources, so be prepared. It also tells you the value for $JOBID to be used in the following steps.
+Then you can connect your terminal to the running job via:
+  srun --pty --overlap --jobid $JOBID bash
+that gives you a new shell on the first allocated node for $JOBID (just like SSH-ing a node with the resources you asked for).
+Once you're done, remember to call:
+  scancel $JOBID
+to release resources for other users.
 ===== Job Management =====