## Monday, March 30, 2015

### Definition of Gly HA2 and HA3 nomenclature in PDB and BMRB files

Note to self:

HA2 is pro-R and HA3 is Pro-S. HA2 corresponds to HA in the other amino acids

Thanks to +Lars Bratholm for figuring this out.

## Saturday, March 28, 2015

### Selecting full residues within a certain distance of another residue or atom in PyMOL

Note to self:

To select all atoms in a residue (plus any HETATMs) that is within 3 Å of any atom in residue 63 type:

select br. all within 3 of resi 63

To select all atoms in a residue (plus any HETATMs) that is within 3 Å of the CA atom in residue 63 type:

select br. all within 3 of 63/CA

If you want to exclude the HETAMS type "pol." (short for polymer) instead of "all".

if you want to name the selection type

select Ala63, br. all within 3 of resi 63

## Thursday, March 26, 2015

### Second reviews of the PCCP paper - almost there

PCCP writes: "I am pleased to inform you that your revised manuscript has been recommended for publication in Physical Chemistry Chemical Physics subject to further revision in line with the attached reports."

Referee: 1

However, I am still concerned about the recommendations for treating low frequency vibrations.  The harmonic oscillator approximation is not appropriate for low frequencies because the entropy term goes to infinity.  As recommended by Grimme (2012), a hindered / free rotor may be a better treatment (for related methods and discussion see TRUHLAR, DG, J COMP CHEM 1991, 12, 266-270 DOI: 10.1002/jcc.540120217 and McClurg, RB, Flagan, RC, Goddard, WA JCP 1997, 106, 6675-6680 DOI: 10.1063/1.473664)

My comment about explicit water molecules also concerned entropy.  While including explicit water molecules may increase the CPU time, this can be overcome.  What extra CPU time cannot overcome is the fact that the explicit waters in reality can exchange with the bulk water increasing their entropy compared to a RRHO calculation of their entropy.  It is best to include explicitly only tightly bound

Immediate reactions
Point 1:
* The paper lists options for how to treat low frequency vibrations but there is not a specific recommendation because the issue as it applies to binding free energies has not been thoroughly compared.
* Grimme's approach is not derived from first principles and is therefore not a priori better than other approaches.  It will in some cases treat modes that are clearly stretch vibration as a free rotor
* The free rotor entropy also goes to infinity as the frequency approaches zero.

Point 2:
* Any such effect is included implicitly in the parameterization of the solvation free energy of H2O.
* Any error in this parameterization is largely cancelled by using the water cluster approach recommended in the paper.
* Bryantsev et al. have shown that using the water cluster approach leads to smooth convergence of solvation free energies of H+ and Cu2+ that are in good agreement with experiment.

## Tuesday, March 24, 2015

### Should I become an editorial board member at Scientific Reports?

This is one of those "trying to get my head around something" posts.
Pros
* It's open access
* Uses CC-BY
* no perceived importance criteria

Cons
* It's too expensive. The APC is of $1495 and no option for waivers +PeerJ has shown it can be done for less. If you need to charge more you are either doing it wrong or you are too greedy. * Because of this, I wouldn't publish there. Anything bio-related would go to +PeerJ and the very few papers that aren't would to to PLoS ONE, usually with a full or partial APC waiver (unless it is directly related to an active grant). * Why should I work for Macmillan? On the face of it, I should decline the invitation. Comments, as always, welcome. This work is licensed under a Creative Commons Attribution 4.0 ## Sunday, March 15, 2015 ### Predicting amide nitrogen (15N) chemical shifts in proteins: where do we go from here? source CC As I wrote over on CCH: "The H-N HSQC spectrum of protein amide groups is one of the most frequently recorded experiments in protein NMR. 15N labeling is comparatively inexpensive and the spectrum can be acquired in a relatively short period of time." In fact I have sometimes seen HSQC spectra used "simply" as a proof that the protein is folded and then "thrown away". But can we extract more information? For example can we deduce the structural changes due to mutations or ligand binding? Can we discriminate between two similar predicted structures as part of protein structure determination? Unfortunately predicting amide N chemical shifts in proteins appears to be very difficult Zhu, He and Zhang predicted amide N chemical shifts at the B3LYP/6-31G(d,p) level of theory (using an implicit solvation model) for Protein G and ubiquitin with mean unsigned errors (MUEs) of 4.8 and 8.0 ppm, respectively. And this after a linear fit to experiment, with$R^2$values of 0.71 and 0.69. For comparison the MUEs for CA are 2.1 and 1.4 ppm, with$R^2$values of 0.79 and 0.88. This was done using a single AMBER optimized structure. Exner et al. used 500 structures from a 5 ns explicit solvent MD of the HA2 domain to predict amide N chemical shifts at the B3LYP/6-31G(d). The MUE was 14.6 ppm and the$R^2$value was 0.81. The corresponding value for C atoms (note: not just CA) is 5.2 and 0.99 and the$R^2$value for aliphatic carbons is 0.99 (couldn't find separate MUE). Has an amide N chemical shift ever been predicted accurately from first principles? Short answer: No - as far as I can tell. A few N chemical shifts have been measured in the gas phase and these can be reproduced to within 1.5 - 2.0 ppm using CCSD(T)/CBS//CCSD(T)/cc-pVTZ. Here vibrational contributions and basis set extrapolation were key with contributions as large as 5 ppm. Unfortunately, none of the molecules contained an amide group. While amide N chemical shifts have not been measured in the gas phase, there is plenty of data in solution - including data for small prototypical amides such as formamide, acetamide, and N-methyl acetamide that are within reach of CCSD(T)/CBS. However, one conclusion from this data (and similar data for ammonia) is that solvent effects can be very large (5-15 ppm) and that a low dielectric solvent is not representative of the gas phase when it comes to N chemical shifts. So (1) one cannot simply use a continuum representation of the solvent and (2) one might as well go for aqueous solution since that is most relevant for proteins and not necessarily harder than other solvents. Exner, Möller, and co-workers attempted a N chemical shift-prediction for N-methyl acetamide in aqueous solution based on 5000 explicit solvent CPMD snapshots, but the results for N is ambiguous since predicting the chemical shift requires an equal treatment of the reference, which they didn't do (se more on this below). However, one very encouraging result was that, for example, the H(N) chemical was predicted to be 5.2 ppm higher than the neighboring H(C), which is in excellent agreement with the experimental value of 5.1 ppm. For comparison the corresponding predicted value in the gas phase is 2.7 ppm. Where do we go from here? It's important to figure out what it takes to predict accurate amide N chemical shifts in aqueous solution. One option is small model systems and here we either have to deal with the reference problem or look at molecules with more than one N atom. Another is to look more closely at select protein residues. Either way, we need to get some basics straight so we are not fumbling in the dark. 1. The Basics a. CCSD(T)/CBS + vibrational correction (a la Teale et al.) benchmark N chemical shift values for formamide, acetamide, and N-methyl acetamide. Moon and Case have done this for the latter, but using a MP2/6-31G(d) structure, which I am not sure is good enough. b. Corresponding benchmark values for the effect of hydrogen bonding (e.g. to water molecules) and dihedral angle changes. Can we assume that the vibrational effects are unchanged? 2. Small models a. Internal reference. Look for experimental data for small amide containing compounds with only one conformation and no significant pH or tautomer effects. I had a very quick look at some tables and found one (not ideal) candidate: hydantoin. Kricheldorf has measured a difference in N chemical shift of 63.9 ppm in 20% w/w aqueous solution. It not ideal because neither N is strictly speaking in an amide group. Ideal model systems would be substituted diglycolyldiamides - like this study but in aqueous solution. b. External reference. Exner, Möller and co-worker predicted a N chemical shift for N-methyl acetamide (NMA) of 159.99 ppm, which they compared to an experimental value of 113.8 ppm. However, the computed shift was relative to computed gas phase ammonia, while the experimental value was referenced to liquid ammonia. One solution is to try to make a similar prediction for liquid ammonia. Another is to compare two aqueous phase prediction: e.g. repeat the study for acetamide and compare the two N chemical shifts to experimental measurements. Yet another approach is this one: the experimental chemical shielding difference between gas phase and liquid ammonia is 19.1, so the predicted chemical shift relative to liquid ammonia is 140.9, which is already a little better. The prediction was done with B3LYP/cc-pVTZ and that certainly contains some error. To find out how much we need CCSD(T)/CBS values which we don't have (Moon and Case don't appear to give the numerical values for NMA). The best I would find so far was MP2/cc-pVQZ which gives a NMA gas phase N chemical shift relative to gas phase ammonia of 114.97 ppm, which is 11.45 ppm lower than the corresponding B3LYP/cc-pVTZ values. So 140.0 - 11.4 = 129.6 ppm is a better estimate. Finally, because of the CPMD the predicted amide chemical shielding included some vibrational averaging, but ammonia does not. Teale et al. predict that vibrational effects lowers the chemical shield of ammonia by 8.7 ppm, so 129.6 - 8.7 = 120.8 ppm, which is starting to approach the experimental value of 113.8 ppm. If we had the CCSD(T)/CBS value and vibration corrections for NMA it is not inconceivable that we could get closer to experiment. The corresponding result for NMA based on the approach applied typically applied to proteins (classical MD using fixed bond lengths and B3LYP/6-31G(d) chemical shifts) is 77.7 ppm, so the CPMD study already tells us something about why this usual approach taken for proteins might fail. 3. Select protein residues I also think that the approach we took for amide protons could yield some insights for amide N atom: i.e. build some detailed structural models (including high level geometry optimizations) of protein regions with well defined secondary structure and (at least initially) as far away from the solvent and charged groups as possible. De Dios, Pearson and Oldfield took a similar approach and got promising results for relative amide N chemical shifts of select Val residues in SNase. If we consistently can get accurate (e.g. within 2 ppm) predictions for amide N chemical shifts, we can use the approach to try to understand the outliers. This work is licensed under a Creative Commons Attribution 4.0 ## Sunday, March 8, 2015 ### Experimental chemical shifts of nitrogen (15N) atoms in formamide, acetamide, and N-methyl acetamide I recently lamented the lack of experimental nitrogen chemical shifts for small prototypical amides such as formamide, acetamide, and N-methylacetamide. I did some digging and here is what I have came up with so far. Before we get to the amides this very interesting paper show the effects of various solvents on the N chemical shift of ammonia relative to the gas phase value. Lower dielectric solvents do not necessarily result in a more gas-phase like chemical shifts. So I only discuss chemical shifts measured in water for the amides since that is most relevant for protein NMR. I use anhydrous liquid ammonia at 25$^o$C as the reference for the same reason. Formamide. Martin et al. measured a N chemical (downfield) shift of 260.0 ppm relative to an external sodium nitrate reference solution. This translates to a chemical shift of (376.5 - 260.0 =) 116.5 ppm. This value is for 0.2 molar fraction formamide in water. Martin et al. show a plot of the chemical shift wrt to mole fraction and the "nitrate" chemical shift appears to drop to about 259 ppm due further dilution, so 117.5 ppm is perhaps a better value. As I've mentioned in this post other values have been measured. Also, the proton chemical shifts have been measured in the gas phase (thanks to +Anders Steen Christensen for the tip. Acetamide. Kricheldorf and Haupt measured a (upfield) N chemical shift of -263.3 ppm relative to nitrate in a 25 wt% sodium nitrate reference solution. This translates to a chemical shift of (376.2 - 263.3 =) 112.9 ppm. The value is for 0.001 M and is within 0.01 ppm of the value measured for 0.01 M and within 0.4 ppm of the value measured for 4.0 M. N-methyl acetamide (NMA). Kricheldorf measured a (upfield) N chemical shift of -263.4 ppm relative to nitrate in a 30 wt% external sodium nitrate reference solution. This translates to a chemical shift of (376.2 - 263.4 =) 112.8 ppm. The value is for 1.5 M at pH 7. Marchal and Carnet have measured a (upfield) N chemical shift of -266.4 ppm relative to neat nitromethane. This translates to a chemical shift of (380.2 - 266.4 =) 113.8 ppm. The value is for 2.0 M. Finally, Exner et al. have recently measured a value of 114.2 ppm for NMA dissolved in phosphate buffer (H2O/D2O = 95:5, 50 mM phosphate, 150 mM NaCl, pH 8) to a final concentration of 50 mM. The same paper also describes a prediction of the N chemical shift, which was followed by this study a year later. NMA has the additional complication of having two conformations E and Z where the methyl group is opposite and next to the O atom, respectively. Fritz et al. have shown that 98.5% of NMA is in the biologically relevant Z conformation in DMSO. Referencing for QM calculations Can we reproduce these values computationally? One of the many problems to address is the referencing. The experimentally measured chemical shielding of liquid ammonia is known (244.4 ppm) so one option is$\delta_x = \sigma_{\mathrm{NH_3(liq)}} - \sigma_x^{\mathrm{comp}} $However, to increase error cancellation one could use another molecule for which gas phase N chemical shielding has been measured, e.g. ammonia (263.5 ppm):$\delta_x = \sigma_{\mathrm{NH_3(gas)}}^{\mathrm{comp}} - \sigma_x^{\mathrm{comp}} + \left( \sigma_{\mathrm{NH_3(liq)}} - \sigma_{\mathrm{NH_3(gas)}} \right)\$

One could also use other molecules and take an average.

Finally, I can recommend this interesting paper entitled Nitrogen NMR and Molecular Interactions

## Wednesday, March 4, 2015

### Reviews of PCCP paper

The reviews of my PCCP paper arrived on Feb 25th and I forgot to post them. I have started working on the rebuttal.  Comments welcome.

Referee: 1

This perspective article discusses the many factors that can make small but important contributions to absolute host-guest binding energies computed with electronic structure methods.  This is a valuable contribution to the literature in that it gathers in one place various aspects of solvation and thermodynamics for binding energies.  There are a number of items that I would like to see addressed before this paper is accepted.
(a) State whether ZPE is included in E_gas or in G_gas,RRHO.  I presume in the former, the latter accounting for only the thermal corrections using RRHO. (however, when I see RRHO, I automatically think it includes ZPE)
(b) Recently, some have advocated calculating the gas-phase thermochemistry at an elevated pressure to simulate the decreased translational freedom encountered in solution. Does this affect the thermal and entropy corrections beyond a simple change in volume?  Is this something that should be encouraged?
(c) For delta G_solv(H+), I find it very risky to compute this directly using explicit solvent molecules.  Better to put the H+ on another molecule of known pKa and use continuum solvation to compute the energy difference.
(d) Other ions:  If the ion concentrations are high (experimentally), is it necessary to consider the effect of ionic strength on the activity in calculating the binding energies? Is this a consideration in the computational simulations as well?
(e) Including more than a few explicit waters in a binding energy calculation can also mess up the entropy term, since the positions of these extra water molecules are not sampled adequately.  However, this should not be a problem if only a few tightly bound waters are included.  It would be good to add a comment.
(f) Minor matters:
Pg 3: The sentence before “Molecular Thermodynamics” seems out of place.  Should it be part of the previous paragraph?
Pg 4: “volume of and ideal gas”
Pg 6: “double-differencing” central difference of double numerical difference?
Pg 6: “better to pretend that the imaginary frequency is real” – a very bad idea if the frequency is small, since the entropy blows up. Maybe better to “pretend” that it is a free rotor, which has a well defined entropy.
Pg 9: “van der Waals interactions with the solvent” -> “van der Waals and dispersion interactions with the solvent” (just so there is no misunderstanding)
Pg 10 - Eq 21 and the sentence after it: V_solv or Delta V_solv? (as in Eq 22 and 23)
Pg 11: “numerical instability”? (is this more a matter of numerical noise due to the discretization of the surface elements of the cavity leading to discontinuities in the PES that are problematic for the optimizer – a number of codes have overcome this problem)
Pg 12: For an interesting paper on thermodynamic cycles and solution phase optimization, see DOI: 10.1039/c4cp04538f)
Pg 15: “if protonations states”
Pg 15 – 19, abstract:  The first person is normally not used in scientific writing

Referee: 2

This is a perspective about using QM methods to estimate ligand-binding free energies, using approaches originating from QM-cluster studies of enzyme reactions. The perspective is concentrated on the treatment of multiple conformations and pKa effects, although other effects are also mentioned. It is somewhat surprising that the author has not published a single paper on the subject of the perspective; consequently numerical results are very few and discussion is much concentrated on a few publications of the Grimme group. Still, the subject is of general interest. However, the scope needs to be better defined and all methods and formulae need to much better defined before the paper can be accepted.
1. In general, the author should go through all equations and ensure that all terms are defined.
2. The scope of the perspective must be better defined. QM methods have been used for over 10 years for ligand binding to proteins, typically using MM/PBSA-like approaches (cf. publications and reviews by Merz, Hobza and Ryde, for example). Likewise, the author ignores attempts to using QM post-processing FEP calculations.
3. The introduction should start with a more general discussion of available methods to calculate ligand-binding energies and why QM is needed.
4. Different types of QM methods should be described and it should be explained why the author concentrate on DFT and SQM methods.
5. What is TPSS27 (p.3)
6. HF-3c should be explained
7. “by fitting against ∆∆H_f,gas to ∆E_gas values” does not make sense to me.
8. Regarding the low-frequency vibrations, Grimme uses a scaling function so that there are smooth transition between vibrations and free rotation (making the actual value of the frequency unimportant below ~100 cm-1). Truhlar et al. have used a similar approach (but not for ligand binding).
9. The prime problem with conformations is not to use Eqn. 7 but to find all low-energy conformations, including the global minimum.
10. What is the accuracy of computationally estimated pKa values (i.e. what does “fairly accurately” mean quantitatively)? Is it enough for ligand binding?
11. The meaning of dG_solv(H+) should be explained and in general the difference between upper- and lower-case delta should be clarified.
12. What do the over-bar X and L in Eqns 14 and 16, etc. signify?
13. It should be “van der Waals”.
14. I think the selection of the reference state is primarily determined by what experimental results you want to reproduce.
15. A short description of available CM approaches would be appropriate, referring to Table 1.
I suppose you need to specify the variant of PCM also (IEF or C or what?).
16. COSMO-RS is parametrized for many more levels of theory than BP/TZVP.
17. References to the accuracy of solvation energies should be given. In the SAMPL competitions, appreciably worse results are typically seen.
18. I think problem with converging solution-phase optimizations is a problem special to the implementation in Gaussian. With COSMO in Turbomole, no such problems are ever seen.
19. A recent update to Ho et al. 2010 is PCCP 2015, 17, 2859.
20. Since the author only considers water solvation, he should consider changing “solvation” to “hydration”.
21. What is meant by “(dispersion and free energy contributions to the binding free energy” on p.19.