HyperQueue
Autoallocation
The autoallocation feature available in HQ is still under development and buggy, don't use it as it's very likely that the job queue will be filled with idling workers which just waste resources.
sbatch-hq
For simple task farming workflows, where the intention is to run many similar (non-MPI parallel,
independent) commands/programs, you can use the CSC utility tool sbatch-hq
to avoid writing
a batch script for HyperQueue from scratch. See below for details and an example.
HyperQueue (HQ) is an efficient sub-node task scheduler. Instead of submitting each one of
your computational tasks using sbatch
or srun
you can just allocate a large resource block
and then use HyperQueue to submit your tasks. This will be much less stressful for the batch queue
system and is therefore the recommended way to run your high-throughput computing use cases.
License
Free to use and open source under MIT License
Available
- Puhti: 0.11.0, 0.13.0
- Mahti: 0.11-dev, 0.13.0
Usage
module load hyperqueue
HyperQueue works on a worker-server-client basis. Here, the server manages connections, workers are started on compute nodes and execute commands which the client submitted to the server. This resembles a Slurm within a Slurm, but you have to start the server and workers yourself.
Starting the server
Note
The following instructions apply for Slurm job scripts where only full nodes are allocated. A full example can be found at the bottom.
Specify where on the file system the HyperQueue server should be placed. All hq
commands
respect this variable so make sure it's set before you call any hq
commands. The server
location can also be placed using the command line flag --server-dir /server/location/on/lustre
.
export HQ_SERVER_DIR=/server/location/on/lustre
If the server directory is not specified it will default to the user home directory. In this case one has to be careful not to mix up separate computations. For simple cases that fit inside one Slurm job, we recommend starting one server per job in some job-specific directory.
Start the server:
hq server start &
until hq job list &>/dev/null ; do sleep 1 ; done
Here, the server is placed in the background (&
) so that we can continue.
Starting the workers
Start the workers (again in the background so our script can continue):
srun --exact --cpu-bind=none --mpi=none hq worker start --cpus=${SLURM_CPUS_PER_TASK} &
Here, we launch one worker per node with each worker getting the full node. If you need HQ to be aware of other resources, e.g memory, local disk or GPUS, see the Generic resource section in the official documentation.
Next, we can start submitting jobs or alternatively wait for all the workers to connect to the server before submission. This is generally good practice as we can notice issues with the workers early.
Wait until all workers are online (note no timeout):
hq worker wait "${SLURM_NTASKS}"
Submitting jobs
Output
By default HQ creates one folder for each job where output is redirected.
You can use the --log
, --stdout
and --stderr
flags to change this behavior.
Note that it's not possible to direct output from multiple jobs into the same file
as each submission will clear the file.
See hq submit --help
for the full list of options.
hq submit <hq submit args> --cpus <n> <COMMAND/executable> <args to program>
This is a non-blocking command similar to sbatch
.
HyperQueue is not limited to running a single execution per submission. Using the
--array 1-N
flag we can start a program N times similar to how Slurm array jobs work.
hq submit --array 1-10 --cpus <n> <COMMAND>
<COMMAND>
then has access to the environment variable HQ_TASK_ID
which is used
to enumerate all the tasks.
When we have submitted everything we want, we need to wait for the jobs to finish. This can be done e.g. with:
hq job wait all
Once we are done running all of our jobs, we shutdown the workers and server to avoid a false error from Slurm when the job ends:
hq worker stop all
hq server stop
sbatch-hq
For simple task farming workflows, where you only want to run many similar (non-MPI parallel,
independent) commands/programs, you can use the CSC utility tool sbatch-hq
. Just specify the
list of commands to run in a file, one command per line. The tool sbatch-hq
will create and
launch a batch job that starts running commands from the file using HyperQueue. You can specify
how many nodes you want to run the commands on and sbatch-hq
will keep executing the commands
until all are done, or the batch job time limit is reached.
Note
Do not include srun
in the commands you want to run. HyperQueue will take care of
launching the tasks using the allocated resources as requested.
Let's assume we have a file named commandlist
with a list of commands that we want to
run using 8 threads each. As an example, let's reserve 1 compute node for the whole job.
This means that we could run either 5 or 16 tasks simultaneously depending on whether we
are using Puhti or Mahti.
module load sbatch-hq
sbatch-hq --cores=8 --nodes=1 --account=<project> --partition=test --time=00:15:00 commandlist
The number of commands in the file can (usually should) be much larger than the number of commands
that can fit running simultaneously in the reserved nodes. See sbatch-hq --help
for more details
on usage and input options.
Full example
#!/bin/bash
#SBATCH --partition=test # Queue name
#SBATCH --account=<project> # Billing project
#SBATCH --nodes=1 # Number of nodes
#SBATCH --cpus-per-task=<n> # CPUs per task (40 on Puhti, 128 on Mahti)
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --time=00:30:00 # Time limit
module load hyperqueue
# Specify a location for the HyperQueue server
export HQ_SERVER_DIR=${PWD}/hq-server-${SLURM_JOB_ID}
mkdir -p "${HQ_SERVER_DIR}"
# Start the server in the background (&) and wait until it has started
hq server start &
until hq job list &>/dev/null ; do sleep 1 ; done
# Start the workers (one per node, in the background) and wait until they have started
srun --exact --cpu-bind=none --mpi=none hq worker start --cpus=${SLURM_CPUS_PER_TASK} &
hq worker wait "${SLURM_NTASKS}"
# Simple example workflow. Compute the checksums of all files in the current
# directory. For small files you would do this very simply in a single
# interactive compute node, right ;)
#
# sha256sum $(find . -maxdepth 1 -type f) > checksums.txt
#
# Parallelized with HyperQueue (not restricted to a single node):
for f in $(find . -maxdepth 1 -type f) ; do
hq submit --cpus 1 --stdout $f.sha256 sha256sum $f &
done
# Wait until all jobs have finished, shut down the HyperQueue workers and server
hq job wait all
hq worker stop all
hq server stop
With other workflow managers
If your workflow manager is using sbatch
for each process execution and you have many short
processes it's advisable to switch to HyperQueue to improve throughput and decrease load on
the system batch scheduler.
Snakemake
Using Snakemake's --cluster
flag we can use hq submit
instead of sbatch
:
snakemake --cluster "hq submit --cpus <threads> ..."
If you are porting a more complicated workflow from Slurm, you can do argument parsing and transformations programmatically using Snakemake's job properties
Nextflow
See a separate tutorial for instructions on how to use HyperQueue as an executor for Nextflow workflows.
More information
MPI
Although HyperQueue does not do MPI execution out of the box, it's possible
using a combination of the HQ feature Multinode
Tasks
and orterun
, hydra
or prrte
. This way one can schedule MPI tasks at a
node-level granularity.