Extended instructions for using Maestro at CSC
Please first read the actual CSC Maestro page and then consult the power user and special case instructions below. Further down there are steps to help solving or diagnosing issues and to prepare data for support requests.
- Extended instructions for using Maestro at CSC
- Standalone jobs on Puhti
- Maestro schrodinger.hosts file
- How to speed up simulations?
- Additional flags for Maestro modules
- Setting number of subjobs or molecules per subjob
- Optimal disk usage
- Running the Maestro GUI on Puhti
- Availability of licenses
- Fizzled jobs
- Run a test job to help diagnosing problems
- Asking for support
- Recap of Maestro usage on Puhti
Standalone jobs on Puhti
Note
All Maestro jobs must be run on compute nodes via the queuing system. Don't run any Maestro jobs, including the GUI, on the login nodes. Maestro jobs on the login node will be terminated without warning.
The recommended way to run Maestro jobs on Puhti is to create the input files on
your local computer and instead of running them, write them to disk. The procedure
is shown in a video on our
main Maestro page. Use e.g. scp
on your local machine to copy the inputs to
Puhti (edit your username and project accordingly):
scp -r my_job <your username>@puhti.csc.fi:/scratch/<your project>
Note that scp
works also in Windows PowerShell. See
Moving data between CSC and local workstation for
other alternatives.
Once the folder my_job
containing all input files has been copied:
- SSH to Puhti.
- Load the Maestro module.
- Go to the input directory.
ssh <your username>@puhti.csc.fi
module load maestro
cd /scratch/<your project>/my_job
The job is submitted to a compute node(s) by running the job_name.sh
script
written out by Maestro. It will formulate the task(s) as Slurm batch job(s) and
ask resources according to the selected HOST in your schrodinger.hosts
file in
your Puhti $HOME
directory.
bash job_name.sh # note the usage of `bash`, not `sbatch`!
Once the simulation has finished, copy the outputs back to your local computer
for analysis. On your local machine, run e.g. scp
again:
scp -r <your username>@puhti.csc.fi:/scratch/<your project>/my_job .
Note that you can also use e.g. the Puhti web interface for copying files between Puhti and your local computer.
Another more advanced version is to use e.g. the pipeline
tool which allows
you to bypass some of the Schrödinger jobcontrol machinery, but requires you to
write the job script yourself. This may be useful in case some of your subjobs
terminate unexpectedly. In this case, please make note of those JobIds and
contact us.
The remainder of this article explains some implementation details on Puhti and helps setting up efficient simulation workflows.
Maestro schrodinger.hosts
file
This file specifies the resources your jobs can get either locally
or from the queuing system. To use the recommended procedure
you need to edit the local (on your computer) schrodinger.hosts
file to
include the same HOSTs that you want to use on Puhti. On Windows, this will
require admin privileges.
On Puhti, Maestro complains about the location of this file, but ignore it,
it's ok. The file is created by a script (echoed on your screen when you give
module load maestro
) that you need to run if the file does not exist.
As the script requests, select the computing project that will be used for
CPU/GPU usage and scratch storage. You can find the actual Slurm options
in the HOST descriptions in the schrodinger.hosts
file. If your jobs require
resources that are not satisfied by any of the predefined HOST descriptions,
feel free to edit the file.
On Puhti, you can take a look at the schrodinger.hosts
file with:
less $HOME/schrodinger.hosts
On your local computer this file will be in the Maestro installation directory,
e.g. on Windows in C:\Program Files\Schrodinger-version\schrodinger.hosts
After the longish header and the localhost
entry, you should see the
Puhti HOST entries as something like:
name: test
queue: SLURM2.1
qargs: -p test -t 00:10:00 --mem-per-cpu=2000 --account=project_2042424
host: puhti-login11
processors: 4
For example, this HOST entry, available for Schrödinger jobs as test (from name: test
),
will use the Slurm partition test (from -p test
), allocate a maximum of 10 minutes of time,
2 GB of memory and consume resources from Project_2042424. If you need different resources you
can edit this file e.g. by adding a new entry. The requests must be within the
partition limits.
If your schrodinger.hosts
file on Puhti does not have the --account=<project>
defined,
delete the file and rerun the script to create it (module load maestro
will print out the
path to the script, copy/paste it to the command line). You don't need to have the
--account=
option set in your local schrodinger.hosts
file. In your local file,
it's enough that the different HOST entries exist (and the GPU-ones have GPUs specified).
Note that the HOST entries and Slurm partitions (or queues) are two different things. The HOST entries define resources using Slurm partitions.
How to speed up simulations?
All other Maestro modules run serial jobs, except Jaguar and Quantum Espresso, which can run "real" parallel jobs. Don't choose a "parallel" HOST for any other job type. Instead of MPI-parallel jobs, Maestro modules typically split the workload into multiple parts, each of which can be run independent of the others. The Maestro documentation has an excellent section on this topic. In the documentation, go to "Getting started" > "Running Schrödinger Jobs" > "Running Distributed Schrödinger Jobs".
It is typical to process a lot of molecules as part of a particular workload. If you have enough molecules, you can split the full set into smaller subsets and process each of the subsets as a separate job. The Maestro modules have easy-to-use options for defining the number of subjobs. However, you must know in advance how many subjobs to launch. In principle, this requires knowing how long one molecule takes, or testing for each different use case.
Important note
When you start working with a new system/dataset, don't test if you got the syntax right with 1 000 000 molecules and 1000 subjobs. Instead, start out with e.g. 50 molecules and 2 subjobs. Learn how long it takes per molecule, confirm that your submit syntax is correct, adjust your parameters if needed and only then scale up.
If you're using the GUI to set up your job script, specify how many (sub)jobs (processors) you want to use. You can easily edit this later in the submit script if you change your mind.
The "default" submit script will work "as is" for small jobs. Just make sure you don't ask for too many (sub)jobs. As a rule of thumb, each subjob should last at least 1 hour, and for very large jobs preferably 24 hours. Running a lot of very short jobs is inefficient in many ways and may degrade the performance of the system for all users, see our high-throughput computing guidelines. For large workflows, you'll need to edit your scripts, see below.
Quantum ESPRESSO
Running multi-node jobs using the "parallel" HOST works well with Quantum ESPRESSO when appropriate
parallelization flags are carefully specified. The default parallelization is over plane waves if
no other options are specified. To improve on this, k-points (if more than one) can be partitioned
into "pools" using the -npools
flag. Also, when running on several hundred cores, the scalability
can be further extended by dividing each pool into "task groups" which distributes the workload
associated with Fast Fourier Transforms (FFTs) on the Kohn-Sham states. This is done using the
-ntg
flag. In order to have good load balancing among MPI processes, the number of k-point pools
should be an integer divisor of the number of k-points and the number of processors for FFT
parallelization should be an integer divisor of the third dimension of the smooth FFT grid (this
can be checked from the output file, grep "Smooth grid" *.out
). Further parallelization levels are
presented in the QE documentation.
The QE parallelization options can be specified in the Job Settings dialog of the QE calculations
panel of the Maestro GUI. Running a job using 160 cores on Puhti (4 nodes) could be parallelized
for example with -npools 4 -ntg 4
so that each k-point pool is given 40 cores, which are further
divided into 4 task groups of 10 cores each.
Using full nodes
When running Maestro modules such as Quantum ESPRESSO on multiple nodes, remember to explicitly
request the appropriate number of nodes by editing the schrodinger.hosts
file with the
--nodes=<number of nodes>
flag. Requesting full nodes prevents fragmenting of the job and
decreases the amount of unnecessary communication between surplus nodes. For large subjobs you
may also need to tune the time and memory requested in the schrodinger.hosts
file to suit your
needs.
Host selection
A single core job as required by the driver process cannot be run on the large
partition on
Puhti. To run multi-node subjobs you need to modify the submission script generated by the GUI
by specifying a separate driver HOST (e.g. -DRIVERHOST interactive -SUBHOST parallel
, see also
below).
The following figures show the time to solution and scaling of the PSIWAT benchmark (2552 electrons, 4 k-points, Maestro 2021.3, pure MPI).
- Scaling is almost ideal up to 4 nodes when using
-npools 4 -ntg 4
. - For this system and QE binary the performance does not scale beyond 320 cores.
- Always confirm the appropriate scaling of your system before running large multi-node jobs (minimum 1.5 times speedup when doubling the number of cores).
Additional flags for Maestro modules
Different modules have different options. You can set some of them in
the GUI, but you may find more options with the -h
flag, e.g.
glide -h
where glide
would be the Maestro module you want to run, like
qsite
, pipeline
, bmin
, ligprep
, etc.
The Maestro documentation has a nice summary of the different options for different modules. In the documentation, go to "Getting started" > "Running Schrödinger Jobs" > "Running Distributed Schrödinger Jobs".
Simple HOST selection
For jobs that finish within about two days and run 10 subjobs, just use:
-HOST normal_72h:10
or if they all finish within 14 days, use:
-HOST longrun:10
If you have a workflow that will last longer, read on.
Advanced HOST selection
The general aim is to have the "driver process" running on a "HOST"
that will be alive for the whole duration of the workflow. Good
options are interactive and longrun if you estimate the complete
workflow to take more than 3 days (queuing included). A "driver process"
that is not using a lot of CPU is also allowed on a login node, but
a subjob is not. Never submit jobs on Puhti login nodes with
-HOST localhost
. It's ok if you create your own batch script or
interactive session and use
localhost on a compute node, but that's for special cases only and not
discussed on this page.
Set the "driver" or "master" to run on a HOST that allows for long run times (if it's a big calculation). The driver needs to be alive for the whole duration of the workflow, otherwise your subjob likely ends up fizzled. You can use "interactive" which allows for 7 days for one core, or "longrun" which allows for 14 days. If you need to run multiple workflows at the same time, choose "longrun" for the next drivers. In both cases select some "normal" HOST (i.e. "small" Slurm partition) for the (sub)jobs. Suitable splitting will reduce your queuing time. Asking for the longrun HOST "just in case" is not forbidden, but may lead to unnecessary queuing.
You may be able to set the number of subjobs already in the GUI. Typically, it would set the "number of processors", which in many drivers will be equal to the number of subjobs. Alternatively, you may be able to set also the number of subjobs. This enables you to limit the number of simultaneous jobs with the "processor count" (so that you and others won't run out of licenses) but keep a single subjob at a suitable size. Please have a look at the help text of your driver via the Help path described above.
In summary, for a large workflow, edit the GUI-generated script along the
lines: -HOST "normal_72h:10"
to -HOST "longrun:1 normal_72h:9"
or e.g.
-HOST "normal_72h"
to -HOST "interactive:1 normal_72h:9"
. Another
alternative is to use explicit flags, -DRIVERHOST interactive -SUBHOST normal_72h
.
Note that you can only have two jobs running in the interactive HOST at the same time.
Desmond jobs can have the -HOST gpu
flag as set by the GUI, but Windows users
need to change the forward slash "/" to backward slash "\" in the binary name.
Authoritative job control instructions from the manual
A more detailed discussion on advanced jobs can be found in the Maestro documentation via the GUI or the Schrödinger website:
- "Getting Started" > "Running Schrödinger Jobs" > "Running Schrödinger Applications from the Command Line" > "The HOST, DRIVERHOST, and SUBHOST Options"
and a table of driver process conventions from:
- "Getting started" > "Running Schrödinger Jobs" > "Running Distributed Schrödinger Jobs"
Setting number of subjobs or molecules per subjob
Tip
If you don't know how long your full workflow will take, don't ask
for more than 10 subjobs and/or NJOBS
. More is not always better!
If you have very large cases, don't exceed 50 simultaneous (sub)jobs.
As an example, the "run settings dialog" of glide
offers three options:
- Recommended number of subjobs.
- Exactly (fill in here) subjobs.
- Subjobs with no more than (fill in here) ligands each.
Aim for such numbers that an average subjob takes 1-24 hours to run. This ensures that the overhead per subjob remains small while offering efficient parallelization, i.e. you get your results quickly and each subjob (as well as the master job) has time to finish.
Don't run subjobs that complete faster than 15 minutes. You can check the
subjob duration afterwards with seff
and use this info in your following jobs: seff <slurm jobid>
.
If time runs out for a subjob, search for "restart" in the
Schrödinger Knowledge Base
for your module, and/or look again for the options of your driver with
the -h
flag. Most jobs are restartable, so you don't lose completed
work or used resources.
If you choose too many subjobs, Maestro may get confused with the Slurm messages and sorting out the issue can be difficult. Also, running too many subjobs at a time can lead to the license tokens running out, and the time spent waiting in the queue will be wasted.
Optimal disk usage
The Schrödinger HOSTs in Puhti have not been configured to use the local NVMe disk, which is available only on some of the compute nodes. Since most jobs don't gain speed advantage from NVMe disk, you'll likely queue less by not asking for it. If your job performs a lot of I/O operations, please contact CSC Service Desk on how to request fast disk. The only disk available for the jobs is the same where your input files already are. Hence, it does not make sense to copy the files to a "temporary" location at the start of the job.
Running the Maestro GUI on Puhti
The Desktop app in the Puhti web interface can be used to run the Maestro GUI on Puhti. The performance may, however, be slower than running the GUI locally (recommended). Running the GUI on Puhti over SSH using X11 forwarding over SSH is not recommended as it performs extremely poorly.
Availability of licenses
The CSC Maestro license has a fixed amount of tokens that are available for everyone. First, Maestro uses module specific tokens, of which there are many for each module. If they run out, then more jobs (of that same type) can be run with general tokens, but when they run out, no more jobs of that type (or any new jobs which need a general token) can be run by anyone. Therefore, this situation should be avoided. Once a job ends, the tokens are released, and are available for everyone.
You can check the currently available licenses (tokens) with:
$SCHRODINGER/utilities/licutil -avail
and currently used licenses (tokens) with:
$SCHRODINGER/utilities/licutil -used
Note that some Maestro tools or workflows use multiple modules and hence licenses or tokens from multiple modules. Typically, one running instance of a module (a job or a subjob) requires several tokens. For example, Desmond and Glide jobs take 8 tokens each.
CPU time (billing units) is a different resource and has nothing to do with license tokens. When you run out of billing units, you or your project manager can apply for more via the My CSC portal.
Fizzled jobs
Sometimes the jobs are launched, but don't finish. The state of the job as
reported by jobcontrol
(see below) is fizzled. This might be due to a
number of reasons, but cleaning up and restarting the jobcontrol service
might help. When you don't have any Maestro jobs running (in Puhti), give:
$SCHRODINGER/utilities/jserver -cleanall
$SCHRODINGER/utilities/jserver -shutdown
Sometimes the jserver -cleanall
command will not work because the program thinks
some jobs are still running. To force purge these jobs, run:
$SCHRODINGER/jobcontrol -delete -force <jobid>
before running the above jserver
commands. <jobid>
should be replaced by the
ID of your stranded job, for example puhti-login11-0-626be035
.
Another reason for jobs ending up fizzled is too many simultaneous jobs. Please have a look at the error files for suggestions, and if this is the case, ask for less subjobs.
Run a test job to help diagnosing problems
Run one of the test jobs that come with Maestro to narrow down potential issues. In your scratch directory on Puhti, give
installation_check -test test
to try running a test job using the test
HOST. If the test succeeds,
the problem is likely in your input. In this case, please proceed to the
postmortem step below.
Asking for support
Maestro has a tool called postmortem that can be used to create a zip file containing the details of a failed job and the Maestro environment. Please add that to your support request to help us analyze your issue. On Puhti, first run:
jobcontrol -list
to find the right JobId (something like puhti-login11-0-4d34ce08
). Then,
check the right flags for postmortem
with:
$SCHRODINGER/utilities/postmortem -h
and create the postmortem file with:
$SCHRODINGER/utilities/postmortem <your schrodinger jobid>
The file may be large, so instead of sending it as an email attachment, consider
using a-flip and just sending
a link instead. Also, see the previous recommendation to first start
testing with small systems, as this will also enable
you to use the test
HOST and avoid queueing.
Please have a look at our instructions on writing support requests. An efficient support request will help us to solve your issue faster.
Recap of Maestro usage on Puhti
- Always test your workflow first with a small sample.
- Please mind the Slurm partition limits.
- Don't run the Maestro GUI on a login node.
- If you must run the GUI on Puhti, use the Puhti web interface Desktop app.
- Don't specify too many subjobs – an optimal subjob takes 1-24 hours.
- Don't specify too many subjobs – there are many researchers using the same license.
- Don't run a heavy "driver process" on the login node
- If the driver process is heavy, use e.g.
-HOST "longrun:1 normal_72h:9"
for 10 subjobs.
- If the driver process is heavy, use e.g.
- Never run anything in parallel on the login node.
- Do not use
localhost
in your script unless you write your own batch script or run Maestro in an interactive session.
- Do not use
- Submit all jobs from your
/scratch
area. - If your local computer uses Windows, edit
\
to/
in your script. - Use the same version of Maestro locally and on Puhti.