Amber
Amber is a molecular dynamics package including a number of additional tools for more sophisticated analysis and in particular NMR structure refinement.
Available
- Puhti: 20, 20-cuda, 22, 22-cuda
- Mahti: 20, 20-cuda, 22, 22-cuda
- LUMI: 22-cpu, 22-gpu
License
Amber can be used on CSC servers by all not-for-profit institute and university researchers irrespective of nationality or location. Look for the academic license text here.
Usage
See available versions and how to load Amber by running:
module spider amber
The module load
command will set $AMBERHOME
and put the AmberTools binaries
in the path. Run Amber production jobs in the batch queues, see below. Very light
system preparation (serial AmberTools jobs lasting a few seconds and using barely
any memory) can be done on the login node as well. Heavier analyses can be run
e.g. in an interactive compute session.
Python modules
Python scripts distributed with AmberTools are only available in the Amber22 modules on Puhti/Mahti. However, since AmberTools is also distributed through Conda, you can easily create a containerized environment containing these scripts yourself using Tykky or the LUMI container wrapper.
Molecular dynamics jobs are best run with pmemd.cuda
as they are much faster
on GPUs than on CPUs. Please note that using pmemd.cuda
requires a module
with the -cuda
extension. Similarly, on LUMI one should use pmemd.hip
(or
pmemd.hip.MPI
for multi-GPU simulations), which requires loading a module
with the -gpu
extension.
Note
Run only GPU aware binaries in the GPU partitions. If you're unsure,
check with seff <slurm_jobid>
that GPUs were used and that the job
was significantly faster than without GPUs.
Example GPU batch script for Puhti
Our tests show that for medium-sized systems the most efficient setup is one GPU card and one CPU core.
#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --partition=gputest
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --account=<project>
#SBATCH --gres=gpu:v100:1
# 1 task, 1 thread, 1 GPU
module purge
module load gcc/9.4.0 openmpi/4.1.4
module load amber/22-cuda
srun pmemd.cuda -O -i mdin -r restrt -x mdcrd -o mdout
Note
If you want to use more than one GPU, perform scaling tests to verify that
the jobs really become faster and use a binary with .cuda.MPI
or .hip.MPI
extension. The rule of thumb is that when you double the resources, the job
duration should decrease at least 1.5-fold. For overall performance info, consult
the Amber benchmark scaling details.
Typically, the best efficiency is achieved with 1 GPU.
You can find example inputs in the Amber20 tests directory on Puhti:
module load amber/20-cuda
ls $AMBERHOME/test
Example CPU batch script for Puhti
The non-GPU aware binaries, e.g. AmberTools, can be run as batch jobs in the following way:
#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --partition=test
#SBATCH --ntasks=1
#SBATCH --account=<project>
# 1 task
module purge
module load gcc/9.4.0 openmpi/4.1.4
module load amber/22
srun paramfit -i Job_Control.in -p prmtop -c mdcrd -q QM_data.dat
Note
pmemd.cuda
and pmemd.hip
are way faster than pmemd.MPI
, so use a
CPU-version only in case you cannot use the GPU-version. If Amber performance
is not fast enough, consider using Gromacs, which is typically
able to scale further (i.e. make use of more CPU and/or GPU resources).
Consider also whether you really need speed or just a lot of sampling. Accelerated sampling can also be achieved through ensemble simulations, where multiple independent trajectories (e.g. the same system equilibrated from different initial velocities) are run at the same time. For more details, see the section on high-throughput simulations with Amber.
Interactive jobs
Sometimes it is more convenient to run small jobs, like system preparations, interactively. Interactive batch jobs prevent excessive load on the login node and should be used in these kinds of cases. You can request a shell on a compute node from the Puhti/Mahti web interface, from the command line with sinteractive, or manually access to a single core with:
srun -n 1 -p test -t 00:05:00 --account=<project> --pty /bin/bash
Once you have been allocated resources (you might need to wait), you can run e.g.
the paramfit
task directly with:
paramfit -i Job_Control.in -p prmtop -c mdcrd -q QM_data.dat
Example GPU batch scripts for LUMI
Amber can be loaded into use on LUMI with:
module use /appl/local/csc/modulefiles
module load amber/22-gpu
# or
module load amber/22-cpu
Note
You need to run the module use
command to modify your $MODULEPATH
,
otherwise modules pre-installed by CSC cannot be accessed.
Example batch job script for LUMI-G using a single GCD (half a GPU):
#!/bin/bash -l
#SBATCH --partition=small-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=1
#SBATCH --time=01:00:00
#SBATCH --account=<project>
module use /appl/local/csc/modulefiles
module load amber/22-gpu
srun pmemd.hip -O -i mdin.GPU -o mdout.GPU -p Cellulose.prmtop -c Cellulose.inpcrd
Example batch job script for LUMI-G using a full GPU node (8 GCDs / 4 GPUs):
#!/bin/bash -l
#SBATCH --partition=standard-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=01:00:00
#SBATCH --account=<project>
module use /appl/local/csc/modulefiles
module load amber/22-gpu
export MPICH_GPU_SUPPORT_ENABLED=1
cat << EOF > select_gpu
#!/bin/bash
export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod +x ./select_gpu
CPU_BIND="mask_cpu:fe000000000000,fe00000000000000"
CPU_BIND="${CPU_BIND},fe0000,fe000000"
CPU_BIND="${CPU_BIND},fe,fe00"
CPU_BIND="${CPU_BIND},fe00000000,fe0000000000"
srun --cpu-bind=$CPU_BIND ./select_gpu pmemd.hip.MPI -O -i mdin.GPU -o mdout.GPU -p Cellulose.prmtop -c Cellulose.inpcrd
A performance comparison of Amber on CPUs (Mahti) and GPUs (Puhti, Mahti, LUMI) is shown in the bar plot below. Notice how the performance of a single GPU on all systems is an order of magnitude better than a full Mahti CPU node (128 cores).
GPU binding on LUMI
Running on multiple GPUs on LUMI will benefit from GPU binding. In the example
above, a bitmask is used to bind CPU cores to optimal (linked) GPUs as well as
exclude the first CPU core in each group of 8 cores (these are reserved for
the operating system and thus not available for computing). For background and
further instructions, see the
LUMI documentation.
Note that CPU/GPU binding is only possible when reserving full nodes (standard-g
or --exclusive
).
Generic batch script examples for LUMI-G and LUMI-C are available in the LUMI documentation.
High-throughput computing with Amber
Similar to Gromacs multidir, Amber has a built-in "multi-pmemd" functionality, which allows you to run multiple MD simulations within a single Slurm allocation. This is an efficient option in cases where you want to run many similar, but independent, simulations. Typical use cases are enhanced sampling methods such as replica exchange MD. Also, since Amber simulations do not typically scale that well to multiple GPUs, multi-simulations can be used as a straightforward method to accelerate sampling by launching several differently initialized copies of your system, all running simultaneously on a single GCD each. If your system is very small and hence unable to utilize the full capacity of a GCD, it might even make sense to run multiple replicas on the same GCD to maximize efficiency.
Note
GPU resources on Puhti and Mahti are scarce, so we recommend running large-scale multi-pmemd simulations only on LUMI. LUMI-G has a massive GPU capacity available, which is also more affordable in terms of BUs compared to Puhti and Mahti.
An example multi-pmemd batch script for LUMI-G is provided below.
#!/bin/bash -l
#SBATCH --partition=standard-g
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=01:00:00
#SBATCH --account=<project>
module use /appl/local/csc/modulefiles
module load amber/22-gpu
export MPICH_GPU_SUPPORT_ENABLED=1
cat << EOF > select_gpu
#!/bin/bash
export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod +x ./select_gpu
CPU_BIND="mask_cpu:fe000000000000,fe00000000000000"
CPU_BIND="${CPU_BIND},fe0000,fe000000"
CPU_BIND="${CPU_BIND},fe,fe00"
CPU_BIND="${CPU_BIND},fe00000000,fe0000000000"
srun --cpu-bind=$CPU_BIND ./select_gpu pmemd.hip.MPI -ng 16 -groupfile groupfile
In this example, 16 copies of a system are run concurrently within a single Amber job,
each using 1 GCD. 2 nodes are requested in total as each node on LUMI-G contains 8 GCDs
(4 GPUs). The input, output, topology and coordinate files for the respective simulations
are defined in a so-called groupfile
:
-O -i mdin.GPU -o mdout000.GPU -p system000.prmtop -c system000.inpcrd
-O -i mdin.GPU -o mdout001.GPU -p system001.prmtop -c system001.inpcrd
-O -i mdin.GPU -o mdout002.GPU -p system002.prmtop -c system002.inpcrd
-O -i mdin.GPU -o mdout003.GPU -p system003.prmtop -c system003.inpcrd
-O -i mdin.GPU -o mdout004.GPU -p system004.prmtop -c system004.inpcrd
-O -i mdin.GPU -o mdout005.GPU -p system005.prmtop -c system005.inpcrd
-O -i mdin.GPU -o mdout006.GPU -p system006.prmtop -c system006.inpcrd
-O -i mdin.GPU -o mdout007.GPU -p system007.prmtop -c system007.inpcrd
-O -i mdin.GPU -o mdout008.GPU -p system008.prmtop -c system008.inpcrd
-O -i mdin.GPU -o mdout009.GPU -p system009.prmtop -c system009.inpcrd
-O -i mdin.GPU -o mdout010.GPU -p system010.prmtop -c system010.inpcrd
-O -i mdin.GPU -o mdout011.GPU -p system011.prmtop -c system011.inpcrd
-O -i mdin.GPU -o mdout012.GPU -p system012.prmtop -c system012.inpcrd
-O -i mdin.GPU -o mdout013.GPU -p system013.prmtop -c system013.inpcrd
-O -i mdin.GPU -o mdout014.GPU -p system014.prmtop -c system014.inpcrd
-O -i mdin.GPU -o mdout015.GPU -p system015.prmtop -c system015.inpcrd
See the Amber manual for further details on multi-pmemd.
References
When citing Amber22 or AmberTools22, please use the following:
D.A. Case, H.M. Aktulga, K. Belfon, I.Y. Ben-Shalom, J.T. Berryman, S.R. Brozell, D.S. Cerutti, T.E. Cheatham, III, G.A. Cisneros, V.W.D. Cruzeiro, T.A. Darden, R.E. Duke, G. Giambasu, M.K. Gilson, H. Gohlke, A.W. Goetz, R. Harris, S. Izadi, S.A. Izmailov, K. Kasavajhala, M.C. Kaymak, E. King, A. Kovalenko, T. Kurtzman, T.S. Lee, S. LeGrand, P. Li, C. Lin, J. Liu, T. Luchko, R. Luo, M. Machado, V. Man, M. Manathunga, K.M. Merz, Y. Miao, O. Mikhailovskii, G. Monard, H. Nguyen, K.A. O'Hearn, A. Onufriev, F. Pan, S. Pantano, R. Qi, A. Rahnamoun, D.R. Roe, A. Roitberg, C. Sagui, S. Schott-Verdugo, A. Shajan, J. Shen, C.L. Simmerling, N.R. Skrynnikov, J. Smith, J. Swails, R.C. Walker, J. Wang, J. Wang, H. Wei, R.M. Wolf, X. Wu, Y. Xiong, Y. Xue, D.M. York, S. Zhao, and P.A. Kollman (2022), Amber 2022, University of California, San Francisco.
More Information
The Amber home page has an extensive manual and useful tutorials.