Sunday, November 12, 2017

Atom_mapper: determining atom correspondence between reactants and products

Many TS search algorithms interpolate between reactants and products and require that the atom order is identical. This is time consuming to enforce by hand and hard to automate, but this paper by the folks at Schrödinger describe a solution that seems to work pretty well. The general idea is also fairly easy to implement if you are familiar with RDKit or similar cheminformatics tools and here I describe my implementation so far.

The basic idea
If two molecules are identical but have different atom numbering then RDKit can map the atoms using "GetSubstructMatch". So (pseudocode) "OCC.GetSubstructMatch(CCO)" gives "(2,1,0)" meaning that atom 2 in CCO corresponds to atom 0 in OCC and so forth.  And "RenumberAtoms(CCO, (2,1,0))" gives OCC.

Let's say our reaction is CH3-CH=O -> CH2=CH-OH.  "product.GetSubstructMatch(reactant)" will not give a match because the molecules are different. To focus on connectivity we remove bond orders: CH3-CH-O -> CH2-CH-OH, but the molecules are still different. The next step is to break each bond in turn and compare. For the reactant you get [H + CH2-CH-O, CH3 + CH-OH, H + CH3-C-O, O + CH3-CH2] and the first entry will match the last entry in the corresponding list for the product [H + CH-CH-OH, CH2 + CH-OH, H + CH2-C-OH, H + CH2-CH-O].

It is very easy to check to for equivalence by converting the list of fragmented molecules to a set of corresponding SMILES strings and looking for identical elements in the two sets: "matches = list(set(react_frag_smiles) & set(prod_frag_smiles))". If there is a match then the atoms can be matched as before because they are identical (note that the bond breaking does not alter the atom numbering of the molecule). If no match is found then the procedure is repeated for the fragments.

One problem is that equivalent atoms are not matched in 3D. For example, if you label (i.e. number) the equivalent H atoms in a methyl group then the methyl group is chiral and RDKit will create a random chirality.  So, the Cartesian coordinated of methyl hydrogens in the CCO example above are not necessarily superimposable, which creates problems in TS search algorithms that use interpolation.

Following the ideas presented in the Schrödinger paper, I try to solve this issue by switching pairs of equivalent atoms and comparing the RMSDs between reactant and product before and after.  However, this will only work if the reactants and products have the same conformation, so I use a modified version of ConstrainedEmbed (ConstrainedEmbedRP) which uses either Cartesian or distance constraints to match two structures using a UFF force field minimization of one of the structures.

Using ConstrainedEmbed has the added advantage of producing reactant and product structures that are in the same orientation and, therefore, often good starting points for interpolation.

Atom_mapper
* I have implemented these ideas in Atom_mapper and made it available on GitHub under the MIT license.
Usage  
* The input is SMILES strings of reactant and products but the code can be adapted to work with user-supplied coordinates via sdf files.
* The output is two sdf files labelled "template" and "mol" with (hopefully) identical atom ordering. Template is the most rigid of the two and the coordinates come from an unconstrained UFF minimization. Mol is superimposed on template using ConstrainedEmbedRP.
* Both geometries should probably be minimized using the energy function you plan to use for the TS search and, depending on the optimization algorithm, the structures should probably be rigidly superimposed before interpolation
Current limitations
* I haven't tested the code extensively. If you run into cases where it doesn't work, file a bug report on GitHub
* No conformation search is done so the conformations chosen will be random. Solution: start with coordinates rather than SMILES.
* If reactants or products consists of two molecules their relative orientation resulting from the UFF minimization will be somewhat arbitrary. This is not a problem if only the reactants or products consists of two molecules, since the coordinates will be superimposed on a more rigid molecule. However, it is a problem if the reactants and products both consist of two molecules. Solution: start with coordinates rather than SMILES, but not sure how to determine best intermolecular orientation automatically.
* The code only permutes pairs of equivalent atoms so it won't work for equivalent groups, e.g. methyl groups in CC(C)(C)CO because we need to switch several pairs (corresponding to C and 3 H atoms) simultaneously.  Not sure how to fix that easily. Suggestions welcome.

There are also some general limitations to this "active bonds" approach as described in the article.


This work is licensed under a Creative Commons Attribution 4.0

No comments: