## The design of a molecular dynamics simulation

The design of a molecular dynamics simulation should account for the available computational power. Simulation size (n = number of particles), timestep, and total time duration must be selected so that the calculation can finish within a reasonable time period. However, the simulations should be long enough to be relevant to the time scales of the natural processes being studied. To make statistically valid conclusions from the simulations, the time span simulated should match the kinetics of the natural process. Otherwise, it is analogous to making conclusions about how a human walks when only looking at less than one footstep. Most scientific publications about the dynamics of proteins and DNA, use data from simulations spanning nanoseconds (10−9 s) to microseconds (10−6 s). To obtain these simulations, several CPU-days to CPU-years are needed. Parallel algorithms allow the load to be distributed among CPUs; an example is the spatial or force decomposition algorithm.

During a classical MD simulation, the most CPU intensive task is the evaluation of the potential as a function of the particles’ internal coordinates. Within that energy evaluation, the most expensive one is the non-bonded or non-covalent part. In Big O notation, common molecular dynamics simulations scale by

${\displaystyle O(n^{2})}$

if all pair-wise electrostatic and van der Waals interactions must be accounted for explicitly. This computational cost can be reduced by employing electrostatics methods such as particle mesh Ewald summation (

${\displaystyle O(n\log(n))}$

), particle–particle-particle–mesh (P3M), or good spherical cutoff methods (

${\displaystyle O(n)}$

).[citation needed]

Another factor that impacts total CPU time needed by a simulation is the size of the integration timestep. This is the time length between evaluations of the potential. The timestep must be chosen small enough to avoid discretization errors (i.e., smaller than the period related to fastest vibrational frequency in the system). Typical timesteps for classical MD are in the order of 1 femtosecond (10−15 s). This value may be extended by using algorithms such as the SHAKE constraint algorithm, which fix the vibrations of the fastest atoms (e.g., hydrogens) into place. Multiple time scale methods have also been developed, which allow extended times between updates of slower long-range forces.

For simulating molecules in a solvent, a choice should be made between explicit and implicit solvent. Explicit solvent particles (such as the TIP3P, SPC/E and SPC-f water models) must be calculated expensively by the force field, while implicit solvents use a mean-field approach. Using an explicit solvent is computationally expensive, requiring inclusion of roughly ten times more particles in the simulation. But the granularity and viscosity of explicit solvent is essential to reproduce certain properties of the solute molecules. This is especially important to reproduce chemical kinetics.

In all kinds of molecular dynamics simulations, the simulation box size must be large enough to avoid boundary condition artifacts. Boundary conditions are often treated by choosing fixed values at the edges (which may cause artifacts), or by employing periodic boundary conditions in which one side of the simulation loops back to the opposite side, mimicking a bulk phase (which may cause artifacts too).

Schematic representation of the sampling of the system’s potential energy surface with molecular dynamics (in red) compared to Monte Carlo methods (in blue).

### Microcanonical ensemble (NVE)

In the microcanonical ensemble, the system is isolated from changes in moles (N), volume (V), and energy (E). It corresponds to an adiabatic process with no heat exchange. A microcanonical molecular dynamics trajectory may be seen as an exchange of potential and kinetic energy, with total energy being conserved. For a system of N particles with coordinates

${\displaystyle X}$

and velocities

${\displaystyle V}$

, the following pair of first order differential equations may be written in Newton’s notation as

${\displaystyle F(X)=-\nabla U(X)=M{\dot {V}}(t)}$

${\displaystyle V(t)={\dot {X}}(t).}$

The potential energy function

${\displaystyle U(X)}$

of the system is a function of the particle coordinates

${\displaystyle X}$

. It is referred to simply as the potential in physics, or the force field in chemistry. The first equation comes from Newton’s laws of motion; the force

${\displaystyle F}$

acting on each particle in the system can be calculated as the negative gradient of

${\displaystyle U(X)}$

.

For every time step, each particle’s position

${\displaystyle X}$

and velocity

${\displaystyle V}$

may be integrated with a symplectic integrator method such as Verlet integration. The time evolution of

${\displaystyle X}$

and

${\displaystyle V}$

is called a trajectory. Given the initial positions (e.g., from theoretical knowledge) and velocities (e.g., randomized Gaussian), we can calculate all future (or past) positions and velocities.

One frequent source of confusion is the meaning of temperature in MD. Commonly we have experience with macroscopic temperatures, which involve a huge number of particles. But temperature is a statistical quantity. If there is a large enough number of atoms, statistical temperature can be estimated from the instantaneous temperature, which is found by equating the kinetic energy of the system to nkBT/2 where n is the number of degrees of freedom of the system.

A temperature-related phenomenon arises due to the small number of atoms that are used in MD simulations. For example, consider simulating the growth of a copper film starting with a substrate containing 500 atoms and a deposition energy of 100 eV. In the real world, the 100 eV from the deposited atom would rapidly be transported through and shared among a large number of atoms (

${\displaystyle 10^{10}}$

or more) with no big change in temperature. When there are only 500 atoms, however, the substrate is almost immediately vaporized by the deposition. Something similar happens in biophysical simulations. The temperature of the system in NVE is naturally raised when macromolecules such as proteins undergo exothermic conformational changes and binding.

### Canonical ensemble (NVT)

In the canonical ensemble, amount of substance (N), volume (V) and temperature (T) are conserved. It is also sometimes called constant temperature molecular dynamics (CTMD). In NVT, the energy of endothermic and exothermic processes is exchanged with a thermostat.

A variety of thermostat algorithms are available to add and remove energy from the boundaries of an MD simulation in a more or less realistic way, approximating the canonical ensemble. Popular methods to control temperature include velocity rescaling, the Nosé–Hoover thermostat, Nosé–Hoover chains, the Berendsen thermostat, the Andersen thermostat and Langevin dynamics. The Berendsen thermostat might introduce the flying ice cube effect, which leads to unphysical translations and rotations of the simulated system.

It is not trivial to obtain a canonical ensemble distribution of conformations and velocities using these algorithms. How this depends on system size, thermostat choice, thermostat parameters, time step and integrator is the subject of many articles in the field.

### Isothermal–isobaric (NPT) ensemble

In the isothermal–isobaric ensemble, amount of substance (N), pressure (P) and temperature (T) are conserved. In addition to a thermostat, a barostat is needed. It corresponds most closely to laboratory conditions with a flask open to ambient temperature and pressure.

In the simulation of biological membranes, isotropic pressure control is not appropriate. For lipid bilayers, pressure control occurs under constant membrane area (NPAT) or constant surface tension “gamma” (NPγT).

### Generalized ensembles

The replica exchange method is a generalized ensemble. It was originally created to deal with the slow dynamics of disordered spin systems. It is also called parallel tempering. The replica exchange MD (REMD) formulation[23] tries to overcome the multiple-minima problem by exchanging the temperature of non-interacting replicas of the system running at several temperatures.

The design of a molecular dynamics simulation
Scroll to top