Detailed advice on individual topics

This part of the best practice guide lists individual topics and questions that are frequently asked (or should be!) addressing important issues regarding QM/MM simulation of biomolecular systems, which were covered in the BioExcel Virtual Workshop on Best Practices in QM/MM Simulation of Biomolecular Systems

For each topic a time-indexed link is provided into a recording on YouTube of the workshop presentation in which the speaker in question shares and discusses best practice regarding that topic. In addition to the individual speaker presentations linked to below, many topics were also covered in the workshop’s Panel Discussion.

How can QM/MM best be used in biomolecular research?

When to use QM/MM, and when not to

Key steps

What are key steps when performing QM/MM simulation?

  • Typical steps in a QM/MM simulation protocol (listen to Carme Rovira on YouTube) :

    • Obtain PDB structure (e.g. of an enzyme complex)

    • System preparation

    • Thermal equilibration (classical MD)

    • Choose QM and QM-MM coupling treatments (region size, level of theory, etc.)

    • Thermal re-equilibration (QM/MM MD)

    • Choose collective variables

    • Reaction simulation (static or dynamic, e.g. metadynamics)

    • Analyze the free energy landscape

    • Determine reaction mechanism

Static modelling vs dynamical simulation

When to use static modelling (single point energies, nudged elastic band, etc.) vs dynamical simulation (umbrella sampling, metadynamics, etc.)

  • Sampling is required, however sampling time to achieve meaningful results can be greatly reduced (listen to Adrian Mulholland on YouTube)

  • Free energy simulations are more reliable for enzymes (listen to Janez Mavri on YouTube)

  • The subset of catalytically competent conformations can be significantly small in comparison with the full conformational landscape, and x-ray structure is a good starting point for reactivity study. For example, single conformation QM/MM calculations may better enable understanding structure-activity relationship if computational resources are limited (listen to Maria João Ramos on YouTube)

  • Static techniques that allow efficient exploration of the conformational space of the enzyme during catalysis can provide insights into the dynamical picture through the PESs associated with the reaction:

Structure / model preparation and validation

Structure preparation and validation

How to set up and validate your model structure (X-ray, NMR, cryo-em, homology modelling, protein structure prediction), including how to consider missing residues, atoms, protons (!), rotameric and tautomer (again protons!) states, (missing) waters, and substrate.

  • It is crucial to consider conformational behaviour of the substrate inside the enzyme pocket. Sample with classical or QM/MM MD. (listen to Adrian Mulholland on YouTube)

  • Evaluate critically your initial PDB structure. If needed, re-optimise the crystal structure. (listen to Ulf Ryde on YouTube)

  • Conformation of substrates in classical MD can be wrong due to forcefield parameterization. QM/MM can be used for sampling to clarify conformation stability. (listen to Carme Rovira on YouTube)

  • It is good to check if your simulations can reproduce features from the crystal structures, like distortions of the substrate or amino acids. Simulations can be used for further crystal structure refinement (listen to Carme Rovira on YouTube)

  • Suggestions to overcome the local minima problem in the case of proteins (listen to Ulf Ryde on YouTube) :

    • Run back and forth from starting state to final state until convergence

    • Optimise only a small region beyond the QM system

    • Base your calculations on many MD simulation snapshots

    • Don’t just minimise the energy, calculate the free energy (include dynamics)

Choosing protonation states

How should one choose protonation states of aminoacids inside the protein? In the “methods” section of a publication one sometimes finds: “Protonation states were chosen based on pKa values, except Asp10, Glu43 and His35, which were protonated”, but without a decent explanation.

Model preparation and validation

Should one perform QM/MM calculation of a fully solvated protein in a periodic box of waters, droplet, implicit solvent or combination? If so, equilibrate at MM level, or not?

  • It is good to compare reactions energetics between gas phase, water and enzymes. One should see a clear catalytic effect in the protein environment. (listen to Janez Mavri on YouTube)

  • When it comes to comparison with experimental data it is better to use explicit solvent. (listen to Janez Mavri on YouTube)

  • QM cluster and QM/MM energies depend on the size of the QM system. (listen to Ulf Ryde on YouTube)

  • Recommendations and the “Big QM” cluster approach to get stable energies while increasing the number of QM atoms to 800-1000 atoms (listen to Ulf Ryde on YouTube) :

    • Include in the QM regions neutral groups up to 4-5 Å away from the minimal QM system consisting of the active site

    • Include ALL the charged groups that are not on the surface of the protein (i.e. buried in the protein)

    • Move the “junction” atoms 2 residues +caps aways from the active site / minimal QM system

  • QM/MM structures (and energies) are much more stable than QM-cluster structures (and energies) while increasing the QM size: smaller QM parts can be used with QM/MM models (listen to Ulf Ryde on YouTube)


QM/MM modelling / simulation protocol and validation

How best to choose a level of theory?

  • Projector-based embedding schemes are beneficial in getting consistent results for different DFT functionals (listen to Adrian Mulholland on YouTube)

  • Coupled Cluster (CC) methods are accurate but slow, MP2-based methods practically are more reliable (listen to Adrian Mulholland on YouTube)

  • First always refer to relevant literature for your specific problem (listen to Maria Khrenova on YouTube)

  • If nobody has studied your biological system yet, focus on the specific chemistry involved in the phenomenon you want to study and start looking at the levels of theory employed to study it described in the literature (listen to Maria Khrenova on YouTube)

  • Which Hamiltonian to choose (listen to Maria João Ramos on YouTube) :

    • From the literature you have to infer the approach that describes all the energy contributions involved in the phenomenon you are investigating

    • Consider the availability of software that implements it

    • Consider the availability of computational resources necessary to run it

  • Recommendations for the method to choose (listen to Ulf Ryde on YouTube) :

    • If structure properties are the aim: pure DFT + dispersion corrections + small basis sets

    • If energies are the aim: single points with larger basis sets

    • Test pure and hybrid functionals: if the results are not comparable then calibrate your energies with higher level of theory (e.g. CCSD(T) for closed shell case or DMRG-PT2 for an open shell system)

  • The usage of polarization functions in the basis set used to describe the QM region is essential. Use at least a DZP basis set. (listen to Maria Khrenova on YouTube)

  • Limitation in the use of diffuse functions in a QM/MM setup (if required they can be employed to describe atoms only in the middle of the QM box) (listen to Maria Khrenova on YouTube)

  • Mixing the levels of theory to draw a conclusion could be dangerous (listen to Maria Khrenova on YouTube)

How best to choose a suitable DFT functional?

  • Benchmark your DFT functionals before embarking on expensive QM/MM calculations, modelling at least the relevant part of your system (e.g. active site) and taking a high-level theoretical method (e.g. CCSD(T)/CBS) as a reference for the energies. (listen to Maria João Ramos on YouTube)

  • Do not use QM/MM (or any other method or computational tool) as a black box: if there are disagreements you need to explain them. Different functionals, inclusion of dispersion correction etc. can yield not only quantitavely but also qualitatively different results, and this can be to how well the approach captures - or fails to capture - key underlying chemistry. (listen to Maria Khrenova present 5 examples on YouTube)

  • Test the functional against experimental results (listen to Maria Khrenova on YouTube)

  • DFT is not a systematically improvable method, sometimes you need to go beyond that, DFT often gives too low barriers (listen to Adrian Mulholland on YouTube)

  • Dispersion corrections for DFT can often improve results (listen to Adrian Mulholland on YouTube)

  • Check if your DFT functional preserves reactants configurations well: conformation of the substrate, H-bonds, etc. (listen to Carme Rovira on YouTube)

How best to choose a suitable QM region size?

  • QM/MM works very well for enzymes even with relatively small QM region sizes. (listen to Adrian Mulholland on YouTube)

  • One should focus more on the quality of QM treatment, rather than QM region size. QM/MM often doesn’t converge with respect to the size of the QM part. (listen to Adrian Mulholland on YouTube)

  • One could check the effect of individual residues on catalysis by calculating their individual contributions into transition state stabilization. (listen to Janez Mavri on YouTube)

  • To choose the initial QM region size it is good to check interactions at the MM level. (listen to Janez Mavri on YouTube)

  • Number of QM atoms typically depends on (limited by) the available computational resources, in practice ~150 QM atoms should be reachable. (listen to Carme Rovira on YouTube)

  • One protocol to find a suitable QM region, aiming to include all important effects in the QM region (listen to Ulf Ryde on YouTube):

    • Start with a rather small QM region and perform a QM/MM optimization with fixed surrounding

    • Repeat it with free surroundings

    • If there is a large difference between the results in the two previous points, then increase the QM size and repeat the cycle from step one

How best to choose the valence saturation scheme at the QM-MM boundary?

Is the electrostatic coupling always the best compromise for the QM/MM coupling?

What kind of systematic basis set benchmarking should one perform?

How to choose a good collective variable (reaction coordinate) and perform effective sampling?

How to choose a good collective variable (reaction coordinate) and perform effective sampling: chemical intuition versus unbiased and automated approaches.

Long-range electrostatics: truncate or not?


Validation, analysis and interpretation of simulation results

High-level QM with limited or no sampling, or low-level QM with sampling?

How can we combine low-level QM for sampling with high-level QM for energetics?

  • Static methods could give a good estimate of enthalpies. To get activation free energies one needs to consider dynamics. You could estimate entropic factors with lower-level QM/MM dynamics. (listen to Adrian Mulholland on YouTube)

  • To account for temperature effects one should consider QM/MD based free energy methods, like metadynamics. (listen to Carme Rovira on YouTube)

How to check validity and convergence of the reaction pathways?

How to validate the final result(s)?