Dynamics¶
Sander¶
This implementation works by reusing the existing interface between
sander and
ORCA, meaning
that no modifications to sander
are needed.
To run, just start an emle-server
background process running:
emle-server > server_log.txt 2>&1 &
Note
Here we also redirect all server output to a log file, server_log.txt
.
Then, launch sander
as normal, e.g.:
sander -O -i md.in -o md.out -p prmtop -c inpcrd -r md.rst -ref inpcrd
This assumes that you are using the external ORCA interface in your md.in
file, e.g. something like:
&qmmm
qmmask=':1-3',
qmcharge=0,
qm_theory='EXTERN',
qmcut=12,
qm_ewald=0
/
&orc
method='BLYP',
basis='6-31G*',
num_threads=4
/
Note
The ORCA settings used here are irrelevant, as the ORCA calculation is not
actually performed. We provide a fake orca
executable that intercepts
the call from sander
and sends it to the emle-server
for calculation.
When the simulation is complete, you can then stop the emle-server
process
by running emle-stop
in the same working directory.
A demo showing how to use emle-engine
to perform ML/MM simulations of
alanine-dipeptide in water using sander
can be found
here.
Using and configuring the calculation server¶
To start an EMLE calculation server:
emle-server
For usage information, run:
emle-server --help
Note
By default, an emle_settings.yaml
file will be written to the working
directory. This contains the settings used to configure the server and can
be used to re-run an existing simulation using the --config
option, or
the EMLE_CONFIG
envirionment variable. Additional emle_pid.txt
and
emle_port.txt
files contain the process ID and port of the server.
To launch a client to send a job to the server:
orca orca_input
where orca_input
is the path to a fully specified ORCA
input file. When using sander
, the orca
executable will be called when
performing QM/MM, i.e. we are using a fake ORCA executable as the QM backend.
The server and client communicate via a TCP/IP connection, so shuld both connect
to the same host and port. These can be specified in a script using the
environment variables EMLE_HOST
and EMLE_PORT
. If not specified, then the
same default values will be used for both the client and server.
To stop a running server, use:
emle-stop
If run in the same working directory as the server was launched from, then this
will use the emle_pid.txt
file to find the process ID of the server to stop.
If no emle_pid.txt
file is found, then all emle-server
processes will be
terminated.
NNPOps¶
The EMLE
Torch model uses Atomic Environment Vectors (AEVs) for the
calculation of the electrostatic embeddign energy. For performannce, it’s
desirable to use the optimised symmetry functions provided by the
NNPOps package. This requires a static
compute graph, so needs to know the atomic numbers for the atoms in the QM
region in advance. These can be specified using the EMLE_ATOMIC_NUMBERS
environment variable, or the --atomic-numbers
command-line argument when
launching the server. This option shuld only be used if the QM region is fixed,
i.e. the atoms in the QM region do not change each time a calculation is sent
to the server.
Backends¶
The embedding method relies on in vacuo energies and gradients, to which
corrections are added based on the predictions of the embedding model. We
provide support for many backends and it should be
easy for users to add their own. The backend used can be specified using the
EMLE_BACKEND
environment variable, or the --backend
command-line
argument when launching the server, e.g:
emle-server --backend mace
Note
The default backend is torchani
.
When using the orca
backend, you will also need to specify the path to the
real orca
executable using the EMLE_ORCA_PATH
environment variable, or
the --orca-path
command-line argument when launching the server. The input
for orca
will be taken from the &orc
block in the sander
input file,
so use this to specify the method, etc.
When using deepmd
as the backend you will also need to specify a model file
to use. This can be passed with the --deepmd-model
command-line argument, or
using the EMLE_DEEPMD_MODEL
environment variable. This can be a single file,
or a set of model files specified using wildcards, or as a comma-separated list.
When multiple files are specified, energies and gradients will be averaged over
the models. The model files need to be visible to the emle-server
, so we
recommend the use of absolute paths.
When using sander
or sqm
as the backend you will also need to specify
the path to an AMBER parm7 topology file for the QM region. The can be specified
using the --parm7
command-line argument, or via the EMLE_PARM7
environment
variable.
We also provide a flexible way of supporting external backends via a callback function that can be specified via:
emle-server --external-backend module.function
The function
should take a single arugment,
an ase.Atoms
object for the QM region, and return the energy in Hartree as a float
along with the gradients in Hartree/Bohr as a numpy.ndarray
.
The external backend can also be supplied using the EMLE_EXTERNAL_BACKEND
environment variable. When set, the backend will take precedence over any
other backend. If the callback is a function within a local module, then
make sure that the directory containing the module is in sys.path
, or
is visible to ehe emle-server
, e.g. the server is launched from the
same directory as the module. Alternatively, use the --plugin-path
to specify the path to a directory containing the module. This can also be
specified using the EMLE_PLUGIN_PATH
environment variable. Make sure
that this is an absolute path so that it is visible to the server regardless
of where it is launched.
Delta-learning corrections¶
We also support the use ot delta-learning corrections to the in vacuo energies and gradients. This can be enabled by passing two backends when launching the server, e.g.:
emle-server --backend torchani,deepmd
Here, the first backend is used to calculate the in vacuo energies and gradients, and the second is used to calculate and apply the corrections.
Torch device¶
We currently support CPU
and CUDA
as the device for PyTorch.
This can be configured using the EMLE_DEVICE
environment variable, or the
--device
command-line argument when launching the server, e.g.:
emle-server --device cuda
When no device is specified, the server will preferentially try to use CUDA
if available. By default, the first CUDA
device index will be used. If you
want to use a different device, e.g. when running on a multi-GPU system, then
you can use the following syntax:
emle-server --device cuda:1
This would tell PyTorch
that we want to use device index 1
. The same
formatting works for the environemtn variable, e.g.: EMLE_DEVICE=cuda:1
.
Embedding method¶
We support elecstrostatic”, “mechanical”, non-polarisable, and MM embedding.
Here non-polarisable embedding using the EMLE model to predict charges for the
QM region, but ignores the induced component of the potential. MM embedding
allows the user to specify fixed MM charges for the QM atoms, with induction once
again disabled. Obviously we are advocating our electrostatic embedding scheme,
but the use of different embedding schemes provides a useful reference for
determining the benefit of using electrostatic embedding for a given system.
The embedding method can be specified using the EMLE_METHOD
environment
variable, or when launching the server, e.g.:
emle-server --method mechanical
The default option is (unsurprisingly) electrostatic
. When using MM
embedding, you will also need to specify MM charges for the atoms within
the QM region. This can be done using the --mm-charges
option, or via
the EMLE_MM_CHARGES
environment variable. The charges should be specified
as a list of floats (space separated from the command-line, or comma separated
in the environment variable) or a path to a file. When using a file, this
should be formatted as a single column, with one line per QM atom. The units
are electron charge.
Alpha mode¶
We support two methods for the calculation of atomic polarisabilities. The default,
species
, uses a single volume scaling factor for each species. Alternatively,
reference
, calculates the scaling factors using Gaussian Process Regression
(GPR) using the values learned for each reference environment. The alpha mode can
be specified using the --alpha-mode
command-line argument, or via the
EMLE_ALPHA_MODE
environment variable.
Logging¶
Energies can be written to a file using the --energy-file
command-line argument
or the EMLE_ENERGY_FILE
environment variable. The frequency of logging can be
specified using --energy-frequency
or EMLE_ENERGY_FREQUENCY
. This should be
an integer specifying the frequency, in integration steps, at which energies are
written. (The default is 0, which means that energies aren’t logged.) The output
will look something like the following, where the columns specify the current step,
the in vacuo energy and the total energy.
# Step E_vac (Eh) E_tot (Eh)
0 -495.724193647246 -495.720214843750
1 -495.724193662147 -495.720214843750
2 -495.722049429755 -495.718475341797
3 -495.717705026011 -495.714660644531
4 -495.714381769041 -495.711761474609
5 -495.712389051656 -495.710021972656
6 -495.710483833889 -495.707977294922
7 -495.708991110067 -495.706909179688
8 -495.708890005688 -495.707183837891
9 -495.711066677908 -495.709045410156
10 -495.714580371718 -495.712799072266
The xyz coordinates of the QM (ML) and MM regions can be logged by providing the
--qm-xyz-frequency
command-line argument or by setting the
EMLE_QM_XYZ_FREQUENCY
environment variable (default is 0, indicating no
logging). This generates a qm.xyz
file (can be changed by --qm-xyz-file
argument or the EMLE_QM_XYZ_FILE
environment variable) as an XYZ trajectory for
the QM region, and a pc.xyz
file (controlled by --pc-xyz-file
argument or
the EMLE_PC_XYZ_FILE
environment variable) with the following format:
<number of point charges in frame1>
charge_1 x y z
charge_2 x y z
...
charge_n x y z
<number of point charges in frame2>
charge_1 x y z
charge_2 x y z
...
The qm.xyz
and pc.xyz
files can be used for error analysis.
End-state correction¶
It is possible to use emle-engine
to perform end-state correction (ESC)
for alchemical free-energy calculations. Here a λ value is used to interpolate
between the full MM (λ = 0) and EMLE (λ = 1) modified potential. To use this
feature specify the λ value from the command-line, e.g.:
emle-server --lambda-interpolate 0.5
or via the EMLE_LAMBDA_INTERPOLATE
environment variable. When performing
interpolation it is also necessary to specifiy the path to a topology file
for the QM region. This can be specified using the --parm7
command-line
argument, or via the EMLE_PARM7
environment variables You will also need
to specify the (zero-based) indices of the atoms within the QM region. To do
so, use the --qm-indices
command-line argument, or the EMLE_QM_INDICES
environment variable. Finally, you will need specify MM charges for the QM
atoms using the --mm-charges
command-line argument or the EMLE_MM_CHARGES
environment variable. These are used to calculate the electrostatic
interactions between point charges on the QM and MM regions.
It is possible to pass one or two values for λ. If a single value is used,
then the calculator will always use that value for interpolation, unless it
is updated externally using the --set-lambda-interpolate
command line
option, e.g.:
emle-server --set-lambda-interpolate 1
Alternatively, if two values are passed then these will be used as initial
and final values of λ, with the additional --interpolate-steps
option
specifying the number of steps (calls to the server) over which λ will be
linearly interpolated. (This can also be specified using the
EMLE_INTERPOLATE_STEPS
environment variable.) In this case the energy
file (if written) will contain output similar to that shown below. The columns
specify the current step, the current λ value, the energy at the current
λ value, and the pure MM and EMLE energies.
# Step λ E(λ) (Eh) E(λ=0) (Eh) E(λ=1) (Eh)
0 0.000000000000 -0.031915396452 -0.031915396452 -495.735900878906
5 0.100000000000 -49.588279724121 -0.017992891371 -495.720855712891
10 0.200000000000 -99.163040161133 -0.023267691955 -495.722106933594
15 0.300000000000 -148.726318359375 -0.015972195193 -495.717071533203
20 0.400000000000 -198.299896240234 -0.020024012774 -495.719726562500
25 0.500000000000 -247.870407104492 -0.019878614694 -495.720947265625
30 0.600000000000 -297.434417724609 -0.013046705164 -495.715332031250
35 0.700000000000 -347.003417968750 -0.008571878076 -495.715515136719
40 0.800000000000 -396.570098876953 -0.006970465649 -495.710876464844
45 0.900000000000 -446.150207519531 -0.019694851711 -495.720275878906
50 1.000000000000 -495.725952148438 -0.020683377981 -495.725952148438
OpenMM¶
We provide an interface between emle-engine
and OpenMM
via the Sire molecular simulation framework.
This allows QM/MM simulations to be run with OpenMM
using EMLE
for the
embedding model. This provides greatly improved performance and flexibility in
comparison to the sander
interface.
To use, first create an emle-sire
conda environment:
conda env create -f environment_sire.yaml
conda activate emle-sire
Next install emle-engine
into the environment:
pip install .
For full instructions on how to use the emle-sire
interface, see the tutorial
documentation here.
When performing end-state correction simulations using the emle-sire
interface
there is no need to specify the lambda_interpolate
keyword when creating an
EMLECalculator
instance. Instead, interpolation can be enabled when creating a
Sire
dynamics object via the same keyword. (See the
tutorial for details.)