Skip to content

Amber

Amber is a molecular dynamics package including a number of additional tools for more sophisticated analysis and in particular NMR structure refinement.

Available

  • Puhti: 20, 20-cuda, 22, 22-cuda
  • Mahti: 20, 20-cuda, 22, 22-cuda
  • LUMI: 24-cpu, 24-gpu

License

Amber can be used on CSC servers by all not-for-profit institute and university researchers irrespective of nationality or location. Look for the academic license text here.

Usage

See available versions and how to load Amber by running:

module spider amber

On LUMI, you need to add the CSC modules to your module path before running the above command:

module use /appl/local/csc/modulefiles

The module load command will set $AMBERHOME and put the AmberTools binaries in the path. Run Amber production jobs in the batch queues, see below. Very light system preparation (serial AmberTools jobs lasting a few seconds and using barely any memory) can be done on the login node as well. Heavier analyses can be run e.g. in an interactive compute session.

Molecular dynamics jobs are best run with pmemd.cuda as they are much faster on GPUs than on CPUs. Please note that using pmemd.cuda requires a module with the -cuda extension. Similarly on LUMI one should use pmemd.hip (or pmemd.hip.MPI for multi-GPU simulations), which requires loading a module with the -gpu extension.

Run only GPU aware binaries in the GPU partitions. If you're unsure, check with seff <slurm_jobid> that GPUs were used and that the job was significantly faster than without GPUs.

Python modules

Python scripts distributed with AmberTools are only available in the Amber22 modules on Puhti/Mahti. However, since AmberTools is also distributed through Conda, you can easily create a containerized environment containing these scripts yourself using Tykky or the LUMI container wrapper.

Example batch scripts for Puhti and Mahti

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --partition=gputest
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --account=<project>
#SBATCH --gres=gpu:v100:1

# Our tests show that for medium-sized systems the most efficient setup is
# one GPU card and one CPU core.

module purge
module load gcc/9.4.0 openmpi/4.1.4
module load amber/22-cuda

srun pmemd.cuda -O -i mdin -r restrt -x mdcrd -o mdout
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --partition=test
#SBATCH --ntasks=1
#SBATCH --account=<project>

# The non-GPU aware binaries, e.g. AmberTools, can be run as batch jobs in
# the following way:

module purge
module load gcc/9.4.0 openmpi/4.1.4
module load amber/22

srun paramfit -i Job_Control.in -p prmtop -c mdcrd -q QM_data.dat
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --partition=gputest
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --account=<project>
#SBATCH --gres=gpu:a100:1

module purge
module load gcc/9.4.0 openmpi/4.1.2
module load amber/22-cuda

srun pmemd.cuda -O -i mdin.GPU -o mdout.GPU -p Cellulose.prmtop -c Cellulose.inpcrd

Note

pmemd.cuda (and pmemd.hip on LUMI) are way faster than pmemd.MPI, so use a CPU-version only in case you cannot use the GPU-version. If Amber performance is not fast enough, consider using GROMACS, which is typically able to scale further (i.e. make use of more CPU and/or GPU resources). Consider also whether you really need speed or just a lot of sampling. Accelerated sampling can also be achieved through ensemble simulations, where multiple independent trajectories (e.g. the same system equilibrated from different initial velocities) are run at the same time. For more details, see the section on high-throughput simulations with Amber.

If you want to use more than one GPU, perform scaling tests to verify that the jobs really become faster and use a binary with .cuda.MPI or .hip.MPI extension. The rule of thumb is that when you double the resources, the job duration should decrease at least 1.5-fold. For overall performance info, consult the Amber benchmark scaling details. Typically, the best efficiency is achieved with 1 GPU.

Example GPU batch scripts for LUMI

Amber can be loaded into use on LUMI with:

module use /appl/local/csc/modulefiles
module load amber/24-gpu
# or
module load amber/24-cpu
#!/bin/bash
#SBATCH --partition=small-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=1
#SBATCH --time=01:00:00
#SBATCH --account=<project>

module use /appl/local/csc/modulefiles
module load amber/24-gpu

srun pmemd.hip -O -i mdin.GPU -o mdout.GPU -p Cellulose.prmtop -c Cellulose.inpcrd
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=01:00:00
#SBATCH --account=<project>

module use /appl/local/csc/modulefiles
module load amber/24-gpu

export MPICH_GPU_SUPPORT_ENABLED=1

cat << EOF > select_gpu
#!/bin/bash

export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF

chmod +x ./select_gpu

CPU_BIND="mask_cpu:fe000000000000,fe00000000000000"
CPU_BIND="${CPU_BIND},fe0000,fe000000"
CPU_BIND="${CPU_BIND},fe,fe00"
CPU_BIND="${CPU_BIND},fe00000000,fe0000000000"

srun --cpu-bind=$CPU_BIND ./select_gpu pmemd.hip.MPI -O -i mdin.GPU -o mdout.GPU -p Cellulose.prmtop -c Cellulose.inpcrd

A performance comparison of Amber on CPUs (Mahti) and GPUs (Puhti, Mahti, LUMI) is shown in the bar plot below. Notice how the performance of a single GPU on all systems is an order of magnitude better than a full Mahti CPU node (128 cores). Note also that Amber is typically not able to scale efficiently to multiple GPUs unless you have a very large system (>1 million particles).

Amber scaling on GPUs and CPUs on Puhti, Mahti and LUMI

GPU binding on LUMI

Running on multiple GPUs on LUMI will benefit from GPU binding. In the example above, a bitmask is used to bind CPU cores to optimal (linked) GPUs as well as exclude the first CPU core in each group of 8 cores (these are reserved for the operating system and thus not available for computing). For background and further instructions, see the LUMI documentation. Note that CPU/GPU binding is only possible when reserving full nodes (standard-g or --exclusive).

Generic batch script examples for LUMI-G and LUMI-C are available in the LUMI documentation.

Interactive jobs

Sometimes it is more convenient to run small jobs, like system preparations, interactively. Interactive batch jobs prevent excessive load on the login node and should be used in these kinds of cases. You can request a shell on a compute node from the Puhti/Mahti web interface, from the command line with sinteractive, or manually with:

srun -n 1 -p test -t 00:05:00 --account=<project> --pty /bin/bash

Once you have been allocated resources (you might need to wait), you can run e.g. the paramfit task directly with:

paramfit -i Job_Control.in -p prmtop -c mdcrd -q QM_data.dat

High-throughput computing with Amber

Similar to GROMACS multidir, Amber has a built-in "multi-pmemd" functionality, which allows you to run multiple MD simulations within a single Slurm allocation. This is an efficient option in cases where you want to run many similar, but independent, simulations. Typical use cases are enhanced sampling methods such as replica exchange MD. Also, since Amber simulations do not typically scale that well to multiple GPUs, multi-simulations can be used as a straightforward method to accelerate sampling by launching several differently initialized copies of your system, all running simultaneously on a single GCD each. If your system is very small and hence unable to utilize the full capacity of a GCD, it might even make sense to run multiple replicas on the same GCD to maximize efficiency.

Note

GPU resources on Puhti and Mahti are scarce, so we recommend running large-scale multi-pmemd simulations only on LUMI. LUMI-G has a massive GPU capacity available, which is also more affordable in terms of BUs compared to Puhti and Mahti.

An example multi-pmemd batch script for LUMI-G is provided below.

#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=01:00:00
#SBATCH --account=<project>

module use /appl/local/csc/modulefiles
module load amber/24-gpu

export MPICH_GPU_SUPPORT_ENABLED=1

cat << EOF > select_gpu
#!/bin/bash

export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF

chmod +x ./select_gpu

CPU_BIND="mask_cpu:fe000000000000,fe00000000000000"
CPU_BIND="${CPU_BIND},fe0000,fe000000"
CPU_BIND="${CPU_BIND},fe,fe00"
CPU_BIND="${CPU_BIND},fe00000000,fe0000000000"

srun --cpu-bind=$CPU_BIND ./select_gpu pmemd.hip.MPI -ng 16 -groupfile groupfile

In this example, 16 copies of a system are run concurrently within a single Amber job, each using 1 GCD. 2 nodes are requested in total as each node on LUMI-G contains 8 GCDs (4 GPUs). The input, output, topology and coordinate files for the respective simulations are defined in a so-called groupfile:

groupfile
-O -i mdin.GPU -o mdout000.GPU -p system000.prmtop -c system000.inpcrd
-O -i mdin.GPU -o mdout001.GPU -p system001.prmtop -c system001.inpcrd
-O -i mdin.GPU -o mdout002.GPU -p system002.prmtop -c system002.inpcrd
-O -i mdin.GPU -o mdout003.GPU -p system003.prmtop -c system003.inpcrd
-O -i mdin.GPU -o mdout004.GPU -p system004.prmtop -c system004.inpcrd
-O -i mdin.GPU -o mdout005.GPU -p system005.prmtop -c system005.inpcrd
-O -i mdin.GPU -o mdout006.GPU -p system006.prmtop -c system006.inpcrd
-O -i mdin.GPU -o mdout007.GPU -p system007.prmtop -c system007.inpcrd
-O -i mdin.GPU -o mdout008.GPU -p system008.prmtop -c system008.inpcrd
-O -i mdin.GPU -o mdout009.GPU -p system009.prmtop -c system009.inpcrd
-O -i mdin.GPU -o mdout010.GPU -p system010.prmtop -c system010.inpcrd
-O -i mdin.GPU -o mdout011.GPU -p system011.prmtop -c system011.inpcrd
-O -i mdin.GPU -o mdout012.GPU -p system012.prmtop -c system012.inpcrd
-O -i mdin.GPU -o mdout013.GPU -p system013.prmtop -c system013.inpcrd
-O -i mdin.GPU -o mdout014.GPU -p system014.prmtop -c system014.inpcrd
-O -i mdin.GPU -o mdout015.GPU -p system015.prmtop -c system015.inpcrd

See the Amber manual for further details on multi-pmemd.

References

When citing Amber or AmberTools, please use the following:

D.A. Case, H.M. Aktulga, K. Belfon, I.Y. Ben-Shalom, J.T. Berryman, S.R. Brozell, D.S. Cerutti, T.E. Cheatham, III, G.A. Cisneros, V.W.D. Cruzeiro, T.A. Darden, N. Forouzesh, M. Ghazimirsaeed, G. Giambaşu, T. Giese, M.K. Gilson, H. Gohlke, A.W. Goetz, J. Harris, Z. Huang, S. Izadi, S.A. Izmailov, K. Kasavajhala, M.C. Kaymak, A. Kovalenko, T. Kurtzman, T.S. Lee, P. Li, Z. Li, C. Lin, J. Liu, T. Luchko, R. Luo, M. Machado, M. Manathunga, K.M. Merz, Y. Miao, O. Mikhailovskii, G. Monard, H. Nguyen, K.A. O'Hearn, A. Onufriev, F. Pan, S. Pantano, A. Rahnamoun, D.R. Roe, A. Roitberg, C. Sagui, S. Schott-Verdugo, A. Shajan, J. Shen, C.L. Simmerling, N.R. Skrynnikov, J. Smith, J. Swails, R.C. Walker, J. Wang, J. Wang, X. Wu, Y. Wu, Y. Xiong, Y. Xue, D.M. York, C. Zhao, Q. Zhu, and P.A. Kollman (2024), Amber 2024, University of California, San Francisco.

More Information

The Amber home page has an extensive manual and useful tutorials.