Running QM/MM simulations

Running a CP2K standalone job

A standard CP2K build creates multiple executables. It is advised to use the cp2k.psmp executable as this enables the hybrid MPI+OpenMP parallelisation scheme, which can have some performance benefit over using only MPI parallelisation on some machines.

CP2K should be run through a job submission script to run on the compute nodes. The following command will launch CP2K (MPI only), specifing the input and output files, and the number of mpi processes.

export OMP_NUM_THREADS=1
joblauncher (-n procs) cp2k.pmsp -i inputfile.inp -o outputfile.out

The joblauncher will depend on the job launcher on your system, common examples are mpiexec, srun and aprun.

To run with multiple threads (MPI+OpenMP) the number of threads should be set to a value greater than 1. Typical values where performance may be improved over pure a MPI job are 2, 4, 6, and 8 threads, although this will depend on many things such as your machine architecture, the type of calculation and your system size. As a general rule the number of threads per MPI process has to be chosen so that it evenly divides the number of MPI processes on a node, whilst ensuring that threads sharing memory are in the same NUMA region. The total number of MPI processes will need to be set so that the number of threads per process multiplied by the number of MPI processes gives the total number of cores requested.

Running QM/MM simulations with the GROMACS/CP2K interface

Once all the required input files for the GROMACS/CP2K interface have been created as described in the GROMACS/CP2K interface QM/MM parameterisation section you can create the GROMACS tpr file and then launch the MD simulation.

gmx grompp -f sys.mdp -p sys.top -c sys.gro -n sys.ndx  -o sys.tpr

joblauncher (n proces) gmx_mpi -s sys.tpr

When generating the tpr file you may get a warning about your system having non-zero charge. This can safely be ignored for QM/MM calculations by using the -maxwarn option.

The joblauncher will depend on the job launcher on your system, common examples are mpiexec, srun and aprun

Performance considerations

When running CP2K standalone or together with GROMACS using the GROMACS-CP2K interface, the run times of simulations will be dominated by the QM and QM/MM calculations within CP2K. The performance of QM/MM simulation using GROMACS together with CP2K is roughly equivalent to that of performing the equivalent simulation using CP2K standalone.

For the sake of reference we include here performance results for a number of CP2K QM/MM benchmarks that are part of the Bioexel QM/MM benchmark suite.

Name	Type	Total atoms	QM atoms	Functional	QM Cell Size	Basis set	Time step	MD Type	Periodic
MQAE-BLYP	solute-solvent	~16,000	34	BLYP	14 x 17 x 11	DZVP-MOLOPT	1 fs	NVE	Y
MQAE-B3LYP				B3LYP	14 x 17 x 11	6-31Gxx
MQAE-B3LYP-large				B3LYP	28 x 34 x 22	6-31Gxx
CBD_PHY-PBE	phytochrome	~168,000	68	PBE	25 x 25 x 25	DZVP-MOLOPT
CBD_PHY-PBE0	phytochrome	~168,000	68	PBE0	25 x 25 x 25	HFX_BASIS TZV2P
ClC-19-BLYP	ion channel	~150,000	19	BLYP	18 x 18 x 18	DVZP-MOLOPT
ClC-253-BLYP	ion channel	~150,000	253	BLYP	27 x 25 x 25	DVZP-MOLOPT
GFP_ScaleQM	fluorescent protein	~28,000	20	BLYP	40 x 40 x 40	DZVP-GTH-BLYP		NVT	N
			32
			53
			77

The raw data for the performance of these benchmarks is available in the QM/MM Benchmarking Data repository. Their performance is summarised below.

Running on CPUs

The time per MD step (s) and performance (ps/day) on ARCHER2 is reported in the Tables below for the benchmark systems. ARCHER2 has 128 cores per node, comprised of two 64-core AMD EYPC processors. More details are given on the ARCHER2 website The results below use MPI+OpenMP with 4 threads per MPI process which was found to, in general, give the best performance.

	MQAE (BLYP)	MQAE (B3LYP)	ClC (QM 19)	ClC (QM 253)	CBD_PHY (PBE)	CBD_PHY (PBE0)
Cores	Time per MD step (s)
32	12.86	20.94	46.83	168.16	112.83	266.25
64	7.38	12.38	25.82	111.86	66.95	127.51
128	4.91	7.85	15.07	75.03	38.13	69.48
256	3.55	5.13		57.79	24.89	40.21
512						24.93

	MQAE (BLYP)	MQAE (B3LYP)	ClC (QM 19)	ClC (QM 253)	CBD_PHY (PBE)	CBD_PHY (PBE0)
Cores	Performance (ps/day)
32	6.71	4.13	1.84	0.51	0.76	0.32
64	11.71	6.98	3.35	0.77	1.29	0.68
128	17.60	11.01	5.73	1.15	2.27	1.24
256	24.34	16.84		1.50	3.47	2.15
512						3.47

Running on GPUs

The time per MD step (s) and performance (ps/day) on Cirrus GPU nodes is reported in the Tables below for the benchmark systems. The Cirrus GPU nodes contain 4 GPUs per node and 20 CPU cores. The GPUs are Nvidia Volta V100’s Here we assign one MPI process per GPU and 10 OpenMP threads per process to make use of the CPU cores. More details are given in the Cirrus documentation

Using the GPU enabled COSMA library was found to not significantly improve the performance.