Friday, April 26, 2013

Determining the distribution of multiple samples in python.

I am currently working on a probabilistic model for assigning peaks from protein NMR spectra to the corresponding spin-systems of the protein. So far I have assumed that the error for different measurements of the same chemical shift follows a normal distribution.
Analyzing a set of already assigned peak-lists for the protein S6, it is clear that there's more outliers than a normal distribution predicts, which justifies a deeper analysis of the data. However even if I assume that the error for e.g. all amid protons in the HNCA spectrum follows the same distribution, the samples variance might be different for the HNcaCO experiment.
A full analysis leaves me in this case with 45 different samples that most likely follow the same kind of distribution with differing parameters. So the question was how to continue from here. Luckily I found this blog post.

What the author did was to brute force all 80 distributions that scipy.stats offers and list the p and D values from a Kolmogorov-Smirnov test for a single sample. I ended up improving the code to support multiple samples, as well as displaying the log likelihood of the fit. (Any distribution that fits extremely poorly with a single sample is removed from the output)

The code can be found here: distribution-check.py

If all your samples are in the folder samples/ with the extension .txt, then simply run

python distribution-check.py samples/*.txt

As an example, here's the output for running the script on my 45 samples:


Apparently a normal distribution is not a very good 2-parameter representation of my samples, even though the simplicity of it might still be beneficial, so I will probably end up testing a few others. The -inf values probably occurs due to asymptotic behaviour at the mean and I would suggest just ignoring them, and use the KS test instead to determine if its a good fit

Thursday, April 25, 2013

The entropy increases when things fall apart

The translational entropy dominates when bonds are broken
An entropy change has four contributions$$\Delta S^\circ=\Delta S^{Molecular}+\Delta S^{\circ,Translation}+\Delta S^{Rotation}+\Delta S^{Vibration}$$For reactions where bonds are broken $S^{\circ,Translation}$ usually dominates.

For example for the reaction $H_2 \rightarrow 2H$ the entropy changes at 25 $^\circ$C are:$$\Delta S^\circ = 11.6+100.1-12.8-0.0=98.9 \text{ J/molK}$$For breaking the hydrogen bond between two water molecules, $H_2O\cdot \cdot \cdot HOH\rightarrow 2H_2O$, the free energy energy contributions are$$\Delta S^\circ =0.0+136.2+9.3-66.0=79.4  \text{ J/molK}$$In both cases $\Delta S^\circ$ is positive because two particles have more entropy than one.

In many cases $\Delta S^\circ \approx \Delta S^{\circ,Translation}$ is a reasonable approximation.

Test: What happens to the standard entropy for this process

     

Monday, April 22, 2013

Presenting data in histograms

I've recently been working on a relatively large amount of data sets of the error between different measurements of the chemical shifts from a specific atom in a protein. A problem I encountered was how to determine the bin width of the histograms without manually choosing a unique value for each set. And if you have to fit a function to the distribution, then the chosen bin width might greatly affect the resulting fit.
The python module astroML contains an improved version of pylab's hist, where the form of the histogram can be automatically chosen based on different statistical models. Two noteworthy models are the Freedman-Diaconis Rule and the Bayesian block method discussed here, with examples of usage shown here.
The following python code calculates the optimal bin number based on the Freedman-Diaconis Rule without the use of the astroML module:
def bins(t):
    t.sort()
    n=len(t)
    width = 2*(t[3*n/4]-t[n/4])*n**(-1./3)
    return int((t[-1]-t[0])/width)
An interesting alternative to histograms for estimating a distribution is Kernel Density Estimation, where kernels (a non-skewed function that integrates to one e.g. a normal distribution) are placed at each datapoint and the sum of the kernels give the kernel density estimate.

As an example, the following code generates a data set x, and plots the data by the three mentioned methods:

import numpy as np
from astroML.plotting import hist
from scipy.stats.kde import gaussian_kde
import pylab

#generate data from two gaussians
x1 = np.random.normal(0,0.5,size=1000)
x2 = np.random.normal(1.5,0.3, size=1000)
x = np.concatenate((x1,x2))

#plot histogram from the Freedman-Diaconis Rule (filled opaque blue bins)
hist(x, normed=1, alpha = 0.2, bins='freedman')

#plot histogram using bayesian blocks (green step line)
hist(x, normed=1, bins='blocks', histtype = 'step')

#plot KDE using gaussian kernels (red):
my_pdf = gaussian_kde(x)
data_range =  np.linspace(min(x),max(x),100)
pylab.plot(data_range,my_pdf(data_range))

pylab.savefig('test')

Which produces the following plot, where the filled bins are from the Freedman-Diaconis Rule, the green step line is the bayesian block method and the red line is the KDE.

Thursday, April 18, 2013

PLoS ONE rejects; we appeal

Background:
On Apr 8, 2013, at 10:38 AM, Jan Jensen wrote:
Dear Dr xxx

I would like a little more detailed justification of the rejection based on the PLoS ONE publication criteria (http://www.plosone.org/static/publication;jsessionid=DB5C8BFAC98BED749D9E351BB0D3B846#data support).  You mention "overall significance" as your justification. This is not a review criterion of PLoS ONE.  

You also mention "strong concerns about the methodology".   Since all reviewer 2's comments are aimed at "significance" (and misses the point of the paper.) I assume this is referring to Reviewer 1's points.  Points 1, 2, and 4 reflect a complete ignorance of the current field of computational enzymology, which I am happy to elaborate on.  Point 3 is ridiculous as we present predictions for close to 400 mutants, so the method is demonstratively high-throughput.

The strongest objections of both reviewers is further proof, e.g. more experimental data. We describe a theoretical method that offers experimentally testable predictions.  Since purely theoretical papers are also appropriate for PLoS ONE (such as our previous PLoS ONE paper on this method http://dx.doi.org/10.1371/journal.pone.0049849), the only goal of additional experiments must be to establish the significance or impact of the method. 

In conclusion, I firmly believe our paper meets all stated criteria for publication in PLoS ONE and your stated reasons (echoing that of the reviewers) for rejection includes a criterium (impact) that is not a review criterion for PLoS ONE.  I would therefore like you to reconsider your decision and perhaps consult other editors, keeping in mind the extremely positive comment of reviewer 3.

Best regards, 


Jan Jensen

----

On Apr 11, 2013, at 3:24 PM, plosone wrote:
Dear Dr Jensen

Thank  you for your email.  

I am writing to inquire whether you would be interested in formally appealing the original decision rendered through PLOS ONE regarding the manuscript PONE-D-13-07851R1. While I cannot guarantee that your appeal will be approved by our in house editors, they will consider appeals via the formal appeals process when you submit a detailed rebuttal letter.

Appeal requests should be made in writing, not by telephone, and should be addressed to plosone@plos.org  with the word "appeal" in the subject line. Authors should provide detailed reasons for the appeal and point-by-point responses to the reviewers' and/or Academic Editor's comments. Decisions on appeals are final without exception.

If you have any further questions or concerns, please do not hesitate to contact us.

With kind regards
xxx
Staff EO
PLOS ONE

---

On Apr 17, 2013, at 2:35 PM, Jan Jensen wrote:

This is an appeal-request for the decision to reject manuscript PONE-D-13-07851R1.  The reason for the appeal is that the primary reason for rejection is the perceived impact of the study, which is not a publication criteria for PLoS ONE (http://www.plosone.org/static/publication).  What follows is a point-by-point response to the points raised by the editor and reviewers.  I note that there was also a third, very positive, review of the manuscript in the previous round of reviews.

The editor:
** “The Reviewers have considered your responses and revisions not convincing. They raised again strong concerns on the methodology and on the overall significance of the conclusions.”

Our response: “significance” is not a publication criteria for PLoS ONE.  Concerns regarding methodology is addressed in response to Reviewer 1 below.

Reviewer #2
** “As the authors claim in their answer, they compare to experiment, the gold standard in science. This is missing for all presented mutants. Unortunately no further experimental or computational characterization of the selected mutants were carried out, therefore the study remains inconclusive and incomplete. As also in experimental screening methods applied, a rescreening of interesting hits is mandatory in any way.”

Our response: This paper offers a computational method for generating hundreds of experimentally testable predictions. PLoS ONE accepts papers in all areas of science, including purely computational studies such as our previous PLoS ONE paper: DOI: 10.1371/journal.pone.0049849.  Thus, the absence of any experimental data should not in itself preclude publication in PLoS ONE.  However, we do offer some experimental verification which is in reasonable agreement with our computational results.  

We hope that future experimental studies test our predictions.  However, even if we are proven wrong this would not alter the fact that our current conclusions are supported by the current data: (1) Barriers of hundreds of mutants are estimated. (2) There is general qualitative agreement with available experimental data - the best one can expect given the many approximations we make and duly note. (3) We offer experimentally testable predictions for other mutants.  

Whether future experimental studies verify these prediction or not will determine the impact of our method, but this is not a criterion for publication in PLoS ONE.

** “According to my specific questions, none of them was sufficiently answered and no changes were applied to the manuscript. 
E.g. my simple question was, why certain active site residues were not considered in the chosen set. The answer, that the criteria are already given in the text is complete nonsense, because all residues questioned by me fulfill exactly the authors diffuse criteria, albeit were not selected. This is highly disappointing and not scientific sound, because from the given criteria one is not able to reproduce the expert choice of residues performed by the authors. Especially for the protonation of P38H I expected a more competent answer from the group of Prof. Jensen instead of no answer at all.”

Our response:  We test the qualitative agreement between our computed data with experiment for some mutants. These mutants are not selected based on our computational method and could just as well have come from already published results or from randomly chosen mutants.  We clearly state that these mutants were picked using heuristic criteria as is common in rational enzyme design.  We never claim that the selection of single mutants is automated, only that double, triple, and quadruple mutants made from this initial selection can be efficiently screened to offer suggestions for promising mutants. 

When applied to a new system, single mutants must again be selected heuristically.  However, as this is currently how most experimental rational design of enzymatic activity is done, this is hardly a major limitation.  Clearly, an reliable automated selection of mutants would increase the impact of the study but impact is not a criterion for publication in PLoS ONE.

** “According to the quantitative interpretation of the computed results, the authors claim, that the intent of the method is not a quantitative ranking. Nevertheless they still give a discrete energy-cutoff in the paper, suggesting a quantitative meaningful barrier to the reader. 
If the goal is just to identify N interesting mutants from a larger subset, this should be clarified in the manuscript and not only in the answer to the editor.”

Our response: As we clearly state in the paper (emphasis added): “We note that defining the cutoff is done purely for a post hoc comparison of experimental and computed data. When using the computed barriers to identify promising experimental mutants, one simply chooses the N mutants with the lowest barriers, where N is the number of mutants affordable to do experimentally (e.g. 20 in the discussion of set L).”

** “I can see no attempt for a scientific discussion about the accuracy and the aim of the method compared with current state of the art methods to predict enzyme activity and conformational space of protein mutants. Therefore the scientific perspective and evaluation of the scientific contribution with regard to existing methods is completely missing. This is in my view not acceptable for a scientific publication, even from an industrial perspective.”

Our response:  As we clearly state in the manuscript: “The computational method used to estimate the reaction barriers of the CalB mutants has been described in detail earlier [1] and is only summarized here. As described previously [1], in order to make the method computationally feasible, relatively approximate treatments of the wave function, structural model, dynamics and reaction path are used. Given this and the automated setup of calculations, some inaccurate results will be unavoidable. However, the intent of the method is similar to experimental high throughput screens of enzyme activity where, for example, negative results may result from issues unrelated to intrinsic activity of the enzyme such as imperfections in the activity assay, low expression yield, protein aggregation, etc. Just like its experimental counterpart our technique is intended to identify potentially interesting mutants for further study.”

The claim that there is “no attempt for a scientific discussion about the accuracy and the aim of the method” is clearly false.  

Furthermore “scientific perspective and evaluation of the scientific contribution” is not a publication criteria in PLoS ONE.

Reviewer #1: This paper is based on unsound methodology.

** “1. as any textbook will show, the transition state is a saddle point and should therefore have only 1 negative eigenvalue. Calculation of the eigenvalues is therefore a standard and common practice to proof that the transition state was indeed obtained. The authors' claim in the rebuttal that a vibrational analysis would not be valid does not make any sense; without eigenvalues it cannot be proven that a transition state was obtained. Such a proof is absolutely necessary to show that the method works (especially given my other concerns, see below).”

Our response: Adiabatic mapping, which we employ in our study, is the most common way to estimate barriers in QM/MM studies of enzymatic reaction mechanisms.  The resulting barriers tend to be in good agreement with experiment, which indicates that this is a reasonable approximation (see for example DOI: 10.1146/annurev.physchem.53.091301.150114 and DOI: 10.1146/annurev.physchem.55.091602.094410).  This is common knowledge in the QM/MM community, but we are happy to add text that explains this.

** “2. The authors cherry-picked data by deciding that certain shapes of the transition barrier should be thrown away, that certain atoms moved too much in the minimization and should be held fixed, etc. etc. How can anyone believe this is a proper procedure with so much arbitrary and manual input, especially without further proof that the transition states were indeed identified.”

Our response: “Cherry pick” implies that we selectively discard data that does not fit with experiment, which we have not done. 

As explicitly stated in the manuscript the aim is “to identify promising mutants for, and to eliminate non-promising mutants from, experimental consideration.“  Occasionally the shape of the reaction profile is inconclusive, i.e. it is not clear whether a particular mutation is promising or not.  The most conservative choice is to classify the mutation as non-promising, but this was only an issue for mutants where we do not know the experimental answer.  Similarly, the same constraints are applied to all mutants, so there can be no question of “cherry picking”.

We are happy to clarify this point in the manuscript.

** “3. All this manual and arbitrary input indicates that the procedure is not robust; therefore, it cannot be used for high-throughput screening.”

Our response: Since we use our method to screen nearly 400 mutants this statement is demonstratively wrong.  Yes, there is some manual intervention, but the method is automated to such a degree that hundreds of mutants can be screened.

** “4. The authors' claim in the rebuttal that an error analysis is not needed since a comparison is made to experiments would be correct if exactly the same property was compared in the experiments as in the computation, but here this is not the case. Experimental activities are characterized by kcat/KM while the computationally obtained number is a barrier height. Since the entropic contribution is missing and since the authors do not know the value of the transmission coefficient, not even kcat can be correctly calculated. Given the arbitrariness of the procedure, and the inherent limitations of a semiempirical method like PM6, an error analysis would be highly appropriate.”

Our response: As we clearly state (emphasis added): “Given the approximations introduced to make the method sufficiently efficient, it is noted that the intent of the method is not a quantitative ranking of the reaction barriers, but to identify promising mutants for, and to eliminate non-promising mutants from, experimental consideration. Therefore only qualitative changes in overall activity are considered.“ 

The reviewer points out that we cannot compute exactly what is measured, yet insists on a quantitative comparison of computed and experimental data.  This does not make any sense.

** “In conclusion, the desire to have a high-throughput algorithm has led to way too many concessions on accuracy and robustness; without further proofs, the accuracy of data and conclusion is in question.”

Our response (repeated from above): This paper offers a computational method for generating hundreds of experimentally testable predictions. PLoS ONE accepts papers in all areas of science, including purely computational studies such as our previous PLoS ONE paper: DOI: 10.1371/journal.pone.0049849.  Thus, the absence of any experimental data should not in itself preclude publication in PLoS ONE.  We, however, do offer some experimental verification which is in reasonable agreement with our computational results.  

We hope that future experimental studies test our predictions.  However, even if we are proven wrong this would not alter the fact that our current conclusions are supported by the current data: (1) Barriers of hundreds of mutants are estimated. (2) There is general qualitative agreement with available experimental data - the best one can expect given the many approximations we make and duly note. (3) We offer experimentally testable predictions for other mutants.  

PLoS ONE rejects our paper

Background:

Then this:

PONE-D-13-07851R1
In Silico Screening of 393 Mutants Facilitates Enzyme Engineering of Amidase Activity in CalB
PLOS ONE

Dear Dr. Jensen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we have decided that your manuscript does not meet our criteria for publication and must therefore be rejected. 

Specifically:

The Reviewers have considered your responses and revisions not convincing. They raised again strong concerns on the methodology and on the overall significance of the conclusions.

I am sorry that we cannot be more positive on this occasion, but hope that you appreciate the reasons for this decision.

Yours sincerely,

xxx
Academic Editor
PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:



Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass this form and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)



Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: In the revised Version of the manuscript entitled In silico screening of 393 mutants facilitates enzyme engineering of amidase activity in CalB by Martin R. Hediger, Luca De Vico, Allan Svendsen, Werner Besenmatter and Jan H. Jensen the authors have include minor changes, stating that the presented essay is intended to deliver potentially interesting mutants for further study.
As the authors claim in their answer, they compare to experiment, the gold standard in science. This is missing for all presented mutants. Unortunately no further experimental or computational characterization of the selected mutants were carried out, therefore the study remains inconclusive and incomplete. As also in experimental screening methods applied, a rescreening of interesting hits is mandatory in any way.
According to my specific questions, none of them was sufficiently answered and no changes were applied to the manuscript. 
E.g. my simple question was, why certain active site residues were not considered in the chosen set. The answer, that the criteria are already given in the text is complete nonsense, because all residues questioned by me fulfill exactly the authors diffuse criteria, albeit were not selected. This is highly disappointing and not scientific sound, because from the given criteria one is not able to reproduce the expert choice of residues performed by the authors. Especially for the protonation of P38H I expected a more competent answer from the group of Prof. Jensen instead of no answer at all.
According to the quantitative interpretation of the computed results, the authors claim, that the intent of the method is not a quantitative ranking. Nevertheless they still give a discrete energy-cutoff in the paper, suggesting a quantitative meaningful barrier to the reader. 
If the goal is just to identify N interesting mutants from a larger subset, this should be clarified in the manuscript and not only in the answer to the editor. 
I can see no attempt for a scientific discussion about the accuracy and the aim of the method compared with current state of the art methods to predict enzyme activity and conformational space of protein mutants. Therefore the scientific perspective and evaluation of the scientific contribution with regard to existing methods is completely missing. This is in my view not acceptable for a scientific publication, even from an industrial perspective.



2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly



Please explain (optional).

Reviewer #1: This paper is based on unsound methodology.

1. as any textbook will show, the transition state is a saddle point and should therefore have only 1 negative eigenvalue. Calculation of the eigenvalues is therefore a standard and common practice to proof that the transition state was indeed obtained. The authors' claim in the rebuttal that a vibrational analysis would not be valid does not make any sense; without eigenvalues it cannot be proven that a transition state was obtained. Such a proof is absolutely necessary to show that the method works (especially given my other concerns, see below).

2. The authors cherry-picked data by deciding that certain shapes of the transition barrier should be thrown away, that certain atoms moved too much in the minimization and should be held fixed, etc. etc. How can anyone believe this is a proper procedure with so much arbitrary and manual input, especially without further proof that the transition states were indeed identified.

3. All this manual and arbitrary input indicates that the procedure is not robust; therefore, it cannot be used for high-throughput screening.

4. The authors' claim in the rebuttal that an error analysis is not needed since a comparison is made to experiments would be correct if exactly the same property was compared in the experiments as in the computation, but here this is not the case. Experimental activities are characterized by kcat/KM while the computationally obtained number is a barrier height. Since the entropic contribution is missing and since the authors do not know the value of the transmission coefficient, not even kcat can be correctly calculated. Given the arbitrariness of the procedure, and the inherent limitations of a semiempirical method like PM6, an error analysis would be highly appropriate.

In conclusion, the desire to have a high-throughput algorithm has led to way too many concessions on accuracy and robustness; without further proofs, the accuracy of data and conclusion is in question.


Reviewer #2: (No Response)

Wednesday, April 17, 2013

Obtaining a partial PLoS ONE fee waiver

From: PLOS invoices <authorbilling@plos.org>
Date: 11. apr. 2013 23.56.35 CEST
To: xxx
Subject: Invoice : PAB70069

Dear Author,

Thank you for choosing to publish with PLOS - by now you will have received an email confirmation that your article "Mapping Enzymatic Catalysis using the Effective Fragment Molecular Orbital Method: towards all ab in" PONE-D-12-39084 has been accepted. Your invoice PAB70069 for this article is attached. Please let us know of any change in billing information including address changes

Thank you for publishing with PLOS and congratulations on your acceptance!

Regards,

Author Billing Team

Public Library of Science

----

NB: when I am writing this the paper has already appeared on the PLoS ONE site.

On Apr 16, 2013, at 10:15 AM, Jan Halborg Jensen wrote:
Dear Author Billing Team

The grant that supported the study has expired so I don't have the funds to pay the entire fee, and I would therefore like to request a partial fee waiver.  From a variety of other sources I can scrape $500 together. 

Best regards, Jan Jensen

----

On Apr 16, 2013, at 9:42 PM, Author Billing wrote:
Greetings Dr. Jan Jensen,

Thank you for your message.  We will honor your request for a partial fee waiver.  We will issue you revised invoice for $500 USD reflecting our adjustment.  Please assist us by paying promptly upon receipt.

We appreciate you publishing with PLOS and choosing to make science open.

Regards,

xxx
Author Billing Team

Wednesday, April 10, 2013

2nd Danish Protein Molecular Modellers Meeting

DPM3.2May 23, 2013 at 13:00-20:00

Protein modelling is becoming increasingly important in both academic and industrial research, and covers a large variety of methods, techniques etc. In April 2011 the first get-together for protein modellers in academia and industry, DPM3.1, was arranged at Novozymes. We would like to continue this good initiative by arranging the DPM3.2 at the University of Copenhagen at May 23, 2013. The purpose of the meeting is to meet and network, exchange ideas and experiences with other protein modellers.
The program is planned to reflect the variety of protein modelling in Denmark, and accordingly both participants from academia and industry, from students to experienced protein modellers are very welcome. A more detailed program is available on the program page.
The program starts at 13:00 and includes scientific presentations, time for posters and discussion, and finishes with an social dinner together at 18:00-20:00.
Please, go to the registration page to sign up for DPM3.2, indicate if you would like to bring a poster and/or participate in the dinner. Participation is free (sponsored by Novo Nordisk and Novozymes). Registration closes on May 10th.

Sponsors:

Saturday, April 6, 2013

Fragment size and computational efficiency in fragmentation methods

I recently reviewed this interesting paper in which the FMO method is extended to four-body interactions (FMO4) and the fragmentation-size is decreased to roughly half a residue per monomer.  The latter was mainly done for analysis purposes, but some interesting results were also obtained for timings and accuracy.

For example, at the MP2/6-31G level of theory and using an FMO2 calculation using one residue per monomer as a reference, the accuracy and CPU cost is comparable to an FMO3 calculation if the fragment size is roughly halved.  The CPU increases only by a factor of 3 on going to FMO4, while the accuracy is increased by an order of magnitude.

In this post I explore this issue further using an idealized EFMO model. I write EFMO instead of FMO so I can ignore the macro-iterations associated the monomer energy, but I don't think will affect the conclusions much.

The total time has contributions from monomer, dimer, trimer, and tetramer ab initio calculations with all other terms being negligible.

$t_{EFMO4}=t_1+t_2+t_3+t_4$

The time for an ab initio calculation scales non-linearly ($\alpha >1$;$\alpha$ and all other Greek symbols are constant) with system size and assuming uniform monomer size so that $s_i=is$ ($s_1=s$):

$t_i=N_i s_i^\alpha=N_i i^{\alpha}s^\alpha$

If only dimers, trimers and tetramers contructed from nearest neighbor monomers need to be evaluated ab initio then the number of multimer $i$ is a linear function.

$N_i=\beta_i N-\gamma_i$

For a given system the monomer size and number of monomers is related by a constant

$sN=\delta$

i.e. a 10 residue protein can be constructed from five two-residue monomers or ten one-residue monomers.  Under these assumptions the CPU time is seen to increase non-linearly with monomer size. $$N_i=\frac{\beta_i\delta}{s} -\gamma_i $$ $$t_i\approx \beta_i\delta i^{\alpha}s^{(\alpha-1)}$$ So decreasing the monomer size will decrease the required CPU time.

Assuming a linear model $(N_i=N-i+1)$ and cubic scaling $(\alpha=3)$ and $sN=200$, e.g. a 200 residue protein described by 100 two-residue monomers ($N=100$), then the CPU time increases by a factor of four on going from EFMO2 to EFMO3$$\frac{t_{EFMO3,N=100}}{t_{EFMO2,N=100}}=3.9$$However, if the EFMO3 calculation is done with 200 monomers with half the size then the cost is not increased. $$\frac{t_{EFMO3,N=200}}{t_{EFMO2,N=100}}=1.0$$while the cost increases roughly three-fold on going to EFMO4: $$\frac{t_{EFMO3,N=200}}{t_{EFMO2,N=100}}=2.7$$Decreasing the size by a further factor of two $(N=400)$ one could expect a modest speedup  on going  to EFMO4$$\frac{t_{EFMO4,N=300}}{t_{EFMO2,N=100}}=0.7$$Going to a 3D example, for example a water cluster, where each monomer has a maximum of four nearest neighbors:$$N_2=2N$$ $$N_3=6N$$ $$N_4=10N$$ (I am not really sure about the last factor), then $$\frac{t_{EFMO3,N=100}}{t_{EFMO2_N=100}}=10.5$$ $$\frac{t_{EFMO3,N=200}}{t_{EFMO2,N=100}}=2.6$$ $$\frac{t_{EFMO4,N=200}}{t_{EFMO2,N=100}}=24.1$$ $$\frac{t_{EFMO4,N=400}}{t_{EFMO2,N=100}}=3.0$$ Increasing the scaling to quadratic $(\alpha=4)$ has relatively little effect on the relative scaling: $$\frac{t_{EFMO3,N=100}}{t_{EFMO2_N=100}}=15.7$$ $$\frac{t_{EFMO3,N=200}}{t_{EFMO2,N=100}}=2.0$$$$\frac{t_{EFMO4,N=200}}{t_{EFMO2,N=100}}=11.7$$$$\frac{t_{EFMO4,N=400}}{t_{EFMO2,N=100}}=1.5$$ The relative timings predicted by the 1D model agree quite well with the findings in the paper.  The 3D model is for an infinite system (i.e. no edge effects) and will overestimate the number of trimers and tetramers for a finite system.  The relative timings quotes for the paper was for 50 monomers.

The MAPLE worksheet I used for the calculations can be found here, and a pdf version here
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License

Wednesday, April 3, 2013

Professorship in Quantum Chemistry of Electronically Excited States at the University of Leuven


At the Faculty of Science, Department of Chemistry. We are looking for a dynamic and highly motivated individual (m/f) with an excellent research record and teaching skills.

Duties
Research
The successful candidate should develop a research program and aspire excellence in the domain of quantum chemistry/computational chemistry, with particular emphasis on the description of the (electronic, geometrical, vibrational) structure of electronically excited states by means of ab initio wave function methods. She/he will be expected to take the lead in intense collaborations with experimental research groups at the KU Leuven (as well as in an international context), by offering quantum chemical support and additional insight in the spectroscopic characterization of molecular materials. The candidate is expected to remain at the frontline during the development and improvement of quantum chemical techniques for the simulation of spectroscopic data. He/she will also keep close contact with developments in methodologies (continuum models, embedding models, QM/MM) that may complement the description of the quantum mechanical region of the molecular system with a lower- level treatment of its surroundings, aiming at improving the correspondence with experimental measurements.

Teaching
The selected candidate will assume teaching responsibilities in Chemistry, including courses in quantum chemistry and computational chemistry. She/he is expected to play a leading role in the development of initiatives promoting the visibility/attractiveness of theory as part of the educational programme in chemistry. The candidate is expected to have a strong background in group theory, or should be able to acquire it in short time, for teaching purposes at the master level. The candidate will strive to achieve the objectives of the KU Leuven in academic level and orientation, and will subscribe to the teaching project of the KU Leuven. Dedication to general education and quality is naturally expected.

Requirements
Interested candidates should hold a Ph.D. or doctoral degree in Chemistry or equivalent.
Qualified candidates are expected to have an excellent research record and very good teaching and training skills, in order to contribute to the research output of the department and to the quality of its educational program. The high quality of the candidate's research should be evidenced by publications in international peer-reviewed journals and by international research experience.
If you do not speak Dutch, you will be expected to learn the language within three years of your appointment. The required proficiency level will depend on the duties assigned to you. Dutch language courses are offered at KU Leuven.Proficiency in the English language is also required.

Offer
The full-time position can be offered in one of the academic levels (full professor, professor, associate professor, assistant professor), depending on the qualifications of the candidate.

Interested?
For more information please contact the Department Chair, prof. dr. Steven De Feyter, tel.: +3216327921, mail: steven.defeyter@chem.kuleuven.be. For problems with online applying, please contact Mrs. Katoe Buyle, tel.: +3216328324, mail: katoe.buyle@kuleuven.be.
You can apply for this job no later than September 30, 2013 via the online application tool