Re: [AMBER] QM method used for force constants derivation in GAFF2 ?

From: Marek Maly <marek.maly.ujep.cz>
Date: Fri, 10 Feb 2017 03:20:19 +0100

Hi Karl,

thank you for your comment !

To be frank, the most interesting question for me is, why
this brand new force field ( GAFF2 ) was introduced.
Since 2004 we have here GAFF force field which was derived from
Amber biomolecular force fields and therefore is perfectly consistent
with them which is important for "bio-drug" systems simulation.

--------------------
"This strategy of using the force constants in the AMBER
protein force field as a references has the obvious limitation that
the results are dependent upon the quality of the previous
parameterization;
on the other hand, it helps to ensure that GAFF is
consistent with the AMBER parameters. Further studies are
planned, aimed at improving both small molecule and protein
values."

J Comput Chem 25: 1157–1174, 2004
--------------------

On one side we have here continuously improved GAFF force field (actually
version 1.8), on the other side brand new GAFF2 force field was
introduced. Force field which was derived independently (using different
approach) on Amber biomolecular ffs, which also brings some decrease of
consistency with those bio force fields.

Nowhere is explained why was this huge work done/necessary. Is it GAFF
(which is used more than 10 years), even in its latest version, so bad
that was necessary to build brand new generalized force filed ? If yes I
would like to know some of those examples where GAFF significantly failed
to reproduce some experimental or high level QM data and of course I would
like to know if GAFF2 solved satisfactorily those eventual situations.

At the moment we just know that GAFF2, which was developed using QM theory
( B3LYP/6-31G* level) is better in repruducing vibrational frequencies,
derived using this theoretical approach ( RMS errors in GAFF2 ca 60 cm-1,
in GAFF 120 cm-1), which is natural.

But I am not sure if this could be conclusively interpreted as that GAFF2
is better than GAFF in general i.e. in all possible simulation
cases/systems. If it is not possible to simply say that GAFF2 is in
general superior successor of GAFF, I would appreciate some
information/recommendations, e.g. some concrete examples of molecular
systems and/or target information
we want to obtain from simulation, where more likely GAFF2 will be more
suitable than GAFF (small molecules vs polymers or even branched polymers,
mixed X/bio systems, dG calculations, requirement to perfectly reproduce
density at the given conditions, requirement to perfectly reproduce
statistic of appearance of the given conformers, hydration energy calc.,
heat of formation calc. etc. ).

It is clear to me, that lot of useful information will be available in the
actually prepared GAFF2 article but, in my opinion, at least some general
usage recommendations should be available, if the new force field is
released, especially if it is a new alternative to existing ff.


Because of this lack of information I did first small tests to compare
those 2 generalized ffs to see differences.
1) in calculation of entropy
2) in calculation of rotational barriers

It seems, based on this experiments, that GAFF2 is slightly better in
estimation of entropy while GAFF is better in rotational barriers
estimation.
But it was really small test so it is not possible to generalize.

Here are the concrete results:

#1
ENTROPY (298.15 K, gas phase, [ cal * K^-1 * mol^-1 ])


                   GAFF GAFF2 EXPERIMENT
Methane 49.42 49.42 44.52
Hexane 86.89 88.65 93.00
tetramethylbut. 93.42 94.19 93.06
(2,2,3,3)
Ethylcyclohexane 87.79 89.31 91.44
Hexadecane 157.85 163.93 185.93

EXPERIMENT sources:
a) Journal of Computational Chemistry, Vol. 17, Nos. 5 &6, 642-668 (1996)

b) J. Phys. Chem. Ref. Data Vol 17, No. 4, 1988 (just for hexadecane)
https://www.nist.gov/sites/default/files/documents/srd/jpcrd351.pdf

In each case the last 1 ns (100 MD frames) from 2.5 ns long MD run
(T=298.15, in vacuum) was analyzed.

Entropy calc. setting

----
&nmode
    nmode_igb=0, nmode_istrng=0.0, nminterval=1,
    maxcyc=10000, drms=0.00001,
---
#2
ROTATIONAL BARRIERS:
a) ETHYLBENZENE
ethyl rotation barrier [kcal/mol]
   GAFF      GAFF2      EXPERIMENT
   1.87      0.90       1.70
Here the difference in GAFF and GAFF2 result is 99.9% caused by difference  
in 1-4 vdw terms.
exp. source: Journal of Computational Chemistry, Vol. 17, No. 4, 429-449  
(1996)
b) BUTANE
conform. labels according to c3-c1-c2-c4 torsional angle [deg]
syn      0
gauche   64.8
ts      119.6
anti    180.0
as defined in J. Chem. Phys. 106 (12), 22 March 1997
                    GAFF    GAFF2   QM
E(gauche)-E(anti)  0.64    1.08	0.62
E(ts)-E(anti)      3.71    2.95	3.31
E(syn)-E(anti)     4.79    4.39	5.51
units: kcal/mol
QM source : J. Chem. Phys. 106 (12), 22 March 1997
Methodology: For each conformation defined by the given torsional angle  
the minimization in vacuum conditions was performed with NMR restraint  
applied to stabilize the desired torsion angle. The final energy (EAMBER)  
of the optimized structure was used. Sander was used here.
mdin setup
--------
   imin=1,maxcyc=5000,ncyc=4900,
   ntb=0,
   igb=6,cut=9999.0,
   ntpr=100,
    nmropt=1,
   /
  &wt type='REST',value1=1.0,value2=1.0, /
  &wt type='END' /
LISTOUT=POUT
DISANG=RST.txt
--------
RST.txt example
-----------------
    &rst
    ixpk= 0, nxpk= 0, iat= 1, 2, 3, 4, r1= 150.0 , r2= 180.0, r3= 180.0,  
r4= 210.0,
        rk2=100.0, rk3=100.0,
   &end
----------------
So from the point of view of these small experiments GAFF and GAFF2 seems  
to be rather quite
comparable.  Nevertheless in these small experiments interestingly GAFF  
seems to be significantly more reliable in some cases (ETHYLBENZENE  
rotation bar., E(gauche)-E(anti) in BUTANE). But as I wrote above, this  
was really small experiment from which is not possible to obtain general  
conclusions.
For sure developers of the GAFF2 force field did already much more  
comparative (GAFF vs GAFF2) studies which will be perhaps in detail  
described in the prepared GAFF2 article. I would appreciate (and probably  
I am not alone) if at least some recommendations [general inforamtion],  
based on those studies, could be available to help us choose  between the  
GAFF and GAFF2 for the given project without necessity to do own extensive  
comparative studies.
   Best wishes,
       Marek
Dne Mon, 06 Feb 2017 10:48:03 +0100 Karl Kirschner  
<k.n.kirschner.gmail.com> napsal/-a:
> Hi Marek,
>
>   I have two thoughts on this that may be helpful. The first one is that  
> in
> Glycam06, we computed vibrational frequencies at HF/6-31G(d) and
> B3LYP/6-31++G(2d,2p) theory levels, and used these as benchmarks. These  
> are
> harmonic, without any type of corrections to account for anharmonicity.
> Since the force field was optimized using these theory levels, we assumed
> that the resulting frequencies should also match these theory levels.
> However, no detailed study was done to determine if these theory levels
> match experimental vibrations for the stretching between to sp3 carbon
> atoms (with the exception for Fig. 17 of the Glycam06 paper). It could be
> that CHARMM found that a smaller C(sp3)-C(sp3) force constant reproduces
> the subsequent vibration (experiment or different QM theory level) better
> in molecules other than carbohydrates (e.g. hydrocarbons).
>
>   Second, I believe (but haven't tested and thus purely hypothetical)  
> that
> the Lennard-Jones force constants might contribute to what values are
> ultimately decided upon for bond and angle parameters due to their
> contribution in larger molecules. Here I am thinking about the difference
> in parameter values one might derive for using two extreme cases of  
> ethane
> (where the longest interaction is a 1-4) versus the central C-C bond of
> 2,2,3,3-tetramethylbutane as your targets. So if there is a big  
> difference
> in the LJ parameter, maybe this is contributing some to the difference in
> the C-C bond parameter.
>
> Bests,
> Karl
>
> On Sun, Feb 5, 2017 at 4:24 AM, Marek Maly <marek.maly.ujep.cz> wrote:
>
>> Hello I am  just resending my last post.
>> Thank you in advance for commenting at least some of my points.
>>
>> Best wishes,
>>
>>   Marek
>>
>> Dne Fri, 27 Jan 2017 01:02:29 +0100 Marek Maly <marek.maly.ujep.cz>
>> napsal/-a:
>>
>> > Hello Junmei,
>> >
>> > first of all thanks a lot for complex answer !
>> >
>> > Nevertheless I have some more comments/questions and I would be  
>> grateful
>> > for your additional comments.
>> >
>> > #1
>> > As we agreed GAFF2 is a "bit less" compatible with the actual AMBER
>> > protein, lipid, nucleid acid
>> > force fields, than GAFF, which is clearly connected with the quite
>> > different way of parameters (at least some) generation. So as this
>> > situation is not ideal for simulations of mixed systems   
>> (drug/protein,
>> > drug/lipid, drug/nucleid acid), would be probably  good idea to
>> > reparameterize in the future also the AMBER bio-force-fields using the
>> > same strategy which was used for GAFF2, especially if extensive
>> > evaluation of GAFF2 ff will in many concrete systems show better  
>> results
>> > (comparing with experiment) than actual version of GAFF. Do you agree  
>> ?
>> >
>> > #2
>> > What is your opinion regarding mixing of GAFF2 with the actual AMBER
>> > bio-force fields ? I think that here  the most important is, how much
>> > are given force fields compatible in nonbond terms parameterisation
>> > (vdw, electrostatic) and this point is probably OK here.
>> > I did not checked this in detail but for example vdw terms for c3 are
>> > pretty similar in both ff.
>> >
>> > GAFF
>> >     c3          1.9080  0.1094             OPLS
>> > GAFF2
>> >     c3          1.9069  0.1078
>> >
>> > You wrote that you are doing a set of evaluation for GAFF2 now, so I
>> > assume that part of this evaluation was for sure devoted to simulation
>> > of mixed systems drug/(biomolecule) where differences in using GAFF2  
>> or
>> > GAFF (for drug) was carefully analyzed (conformations of the drug on  
>> the
>> > binding site, conformation of molecular complex, binding energy) and
>> > eventually compared with experiment.
>> >
>> > It is clear to me, that you cannot say here any details about those
>> > evaluation results before they are published, but can you at least  
>> tell
>> > us, just as far as you know now, if using of GAFF2 in combination with
>> > the actual AMBER bio-force fields  is really safe ? I mean if you,  
>> based
>> > on your actual knowledge, can recommend it or if you rather recommend  
>> to
>> > use the latest version
>> > (1.8) of GAFF ?
>> >
>> > #3
>> > This change of parameterization strategy unfortunatelly creates some
>> > incompatibility problem (I hope not so big) also when one would like  
>> to
>> > add new parameters for new atoms/atom types to GAFF2 using paramfit
>> > routine as it only allows to fit using energies or forces. I think  
>> that
>> > more compatible will be now (to add new parameters to GAFF2) to use  
>> here
>> > forces fitting which is unfortunately still available only for
>> > commercial gaussian outputs.
>> > What is your opinion here ? Which are plans here for the near future ?
>> > Some extension to use QM vibration frequencies for fitting or at least
>> > to make accessible force fitting for those who uses free GAMESS   
>> instead
>> > GAUSSIAN ? BTW you wrote "The ab initio frequencies were obtained at
>> > B3LYP/6-31G* level." To be frank I would expect a little more accurate
>> > approach, but I am definitely not specialist here and you probably  
>> found
>> > that this is quite enough comparing to experimental frequencies where
>> > available. But my question is, if one uses paramfit to add new
>> > parameters it is automatically the best choice also B3LYP/6-31G* or if
>> > I  use a bit more precise theory like MP2/6-31G** it is also OK or  
>> even
>> > better ?
>> >
>> > #4
>> > Regarding that c3-c3 bond I did some ff searching. Mostly I found K
>> > value close to GAFF value (i.e. arround 300). Here I would like to
>> > emphasize GLYCAM case:
>> >
>> >   GLYCAM6j
>> > Cg-Cg  310.0    1.520       Butane (gauche, and trans)
>> >
>> > where the K is 310 kcal/mol/A*A (so similar to GAFF and different from
>> > GAFF2 value) in spite
>> > the fact that in AMBER16 manual page 41 we can read:
>> >
>> >
>> > "The GLYCAM-06 force field has been validated against quantum  
>> mechanical
>> > and
>> > experimental properties, including: gas-phase conformational energies,
>> > hydrogen bond energies, and vibrational frequencies; solution-phase
>> > rotamer populations (from NMR data); and solid-phase vibrational
>> > frequencies and crystallographic unit cell dimensions."
>> >
>> > So this force field was clearly validated also using "vibrational
>> > frequencies" and still we have here c3-c3 K = 310 which is so much
>> > different from GAFF2 value (232.52) which was derived using  
>> vibrational
>> > frequencies fitting. Do you have any explanation here ?
>> >
>> > On the other hand I found that in CHARMM they are very close to GAFF2  
>> K
>> > value:
>> >
>> > CHARMM36
>> > -----------------------
>> > CARBOHYDRATES
>> > CC311   CC311    222.50  1.500   ! par22 CT1 CT1  ( CC311 generic
>> > acyclic CH carbon)
>> > CC312   CC312    222.50  1.485   ! adm 11/08, glycerol ( CC312 CH  
>> carbon
>> > in linear polyols)
>> > CC321   CC321    222.50  1.530   ! par22 CT2 CT2 ( generic acyclic CH2
>> > carbon (hexopyranose C6) )
>> >
>> >
>> > PROTEINS
>> > CT2  CT2   222.500     1.5300 ! ALLOW   ALI (CH2)
>> > CT3  CT2   222.500     1.5280 ! ALLOW   ALI (CH3)
>> >
>> > NUCLEIC ACIDS
>> > CN8  CN8    222.50      1.528   !Alkanes, sacred  (Nucleic acid carbon
>> > (equivalent to protein CT2))
>> > CN8  CN9    222.50      1.528   !Alkanes, sacred  (Nucleic acid carbon
>> > (equivalent to protein CT3))
>> >
>> > GENERAL FF
>> > CG321  CG321   222.50     1.5300 ! PROT alkane update, adm jr., 3/2/92
>> >
>> >
>> > --------------------------
>> >
>> > And for example  this sentence from the basic CHARMM article:
>> >
>> > "Bond, valence angle, Urey-Bradley, improper and torsion
>> > force constants are based on MP2/6-31G(d) vibrational
>> > spectra."
>> >
>> > clearly explain this similarity. To be frank after I found this I
>> > started to trust a little more in GAFF2 than before as CHARMM is for
>> > sure not a bad force field :))
>> >
>> > But anyway here is nicely seen compatibility of the equivalent  
>> atomtypes
>> > across the all CHARMM specialised ff (carbohydrates, proteins, nucleid
>> > acids, general ff) which was so nicely seen also in AMBER before GAFF2
>> > appeared :))
>> >
>> >
>> > #5
>> > It is "very nice" that in Amber manual (page 33) is recommendation to  
>> use
>> >   GAFF2 for organic molecules/ligands, but to be frank I did not find
>> > any information about this new
>> >   force field in manual.
>> >
>> >   In chapter "15.6.2. New Development of GAFF" there are clearly
>> > described just some last corrections to GAFF.
>> >
>> >   So in my opinion would be appropriate to add here at least some  
>> short
>> > info similar to that which is
>> >   written at the end of the gaff2.dat file, which I additionally found
>> > just by accident.
>> >
>> > So thank you in advance very much for your eventual additional  
>> comments.
>> >
>> >    Best wishes,
>> >
>> >        Marek
>> >
>> >
>> >
>> >
>> >
>> > Dne Thu, 26 Jan 2017 19:04:30 +0100 Junmei Wang <junmwang.gmail.com>
>> > napsal/-a:
>> >
>> >> In GAFF, the bond stretching and bond angle bending force constants  
>> come
>> >> from protein/nucleic acid force fields (FF94, FF99) whenever  
>> possible.
>> >> For
>> >> those not having any counterparts in FF94/FF99, we used a set of
>> >> empirical
>> >> formulae  (please refer to the GAFF paper) to calculate the force
>> >> constants. The parameters of those formulae were derived to reproduce
>> >> the
>> >> existing force constants in FF94/FF99.
>> >>
>> >> However, in GAFF2, we adopted a different philosophy to derive those
>> >> parameters (26 in total).  We optimized those parameters to minimize  
>> the
>> >> differences between the calculated MM frequencies and the ab initio  
>> ones
>> >> (the training set has 572 small molecules and 22407 frequencies in
>> >> total).
>> >> We also tested the best parameter set with a large test set (711
>> >> molecules,
>> >> 29000 frequencies). The ab initio frequencies were obtained at
>> >> B3LYP/6-31G*
>> >> level. We could reduce the RMS errors from 120 cm-1 for GAFF to 60  
>> cm-1
>> >> for
>> >> both the training and test sets.
>> >>
>> >> We are now doing a set of evaluation for GAFF2, which will be a part  
>> of
>> >> the
>> >> GAFF2 paper. I agree with you that GAFF is more consistent with the
>> >> current
>> >> protein/nucleic acid force fields.  And the difference between GAFF  
>> and
>> >> GAFF2 for c3-c3 is very big.
>> >>
>> >> Hopefully this can help you to choose a proper version of GAFF force
>> >> field
>> >> in your study.
>> >>
>> >> Junmei
>> >>
>> >> On Tue, Jan 24, 2017 at 7:24 PM, Marek Maly <marek.maly.ujep.cz>  
>> wrote:
>> >>
>> >>> Thank you !
>> >>>
>> >>> BTW that difference in c3-c3 force constant GAFF vs GAFF2 is really  
>> big
>> >>> 300.9 vs 232.52 I would say. To be frank I am a bit confused with  
>> such
>> >>> a
>> >>> big change moreover in such basic/common atomtype bonding term.
>> >>> This indicates that methodology and/or set of molecular fragments  
>> which
>> >>> were used for fitting in GAFF2 case had to be quite different from
>> >>> methodology or mol. fragments used in GAFF case.
>> >>>
>> >>> I am also a bit confused with the fact that the same term in  
>> actually
>> >>> recommended protein force field ff14SB has still force constant  
>> close
>> >>> to
>> >>> old GAFF value:
>> >>>
>> >>> CT-CT  310.0    1.526       JCC,7,(1986),230; AA, SUGARS
>> >>>
>> >>> So old GAFF seems to me to be a more compatible with ff14SB from  
>> this
>> >>> point of view than with the new GAFF2, which is a bit strange or  
>> not ?
>> >>>
>> >>> BTW the same trend holds if we compare here GAFF/GAFF2 with LIPID14  
>> ff.
>> >>>
>> >>> cA-cA  303.1    1.5350       Lipid11  v1.0 (GAFF c3-c3)
>> >>> cD-cD  303.1    1.5350       Lipid14  v2.0 (GAFF c3-c3)
>> >>>
>> >>> again we are much closer here to GAFF values.
>> >>>
>> >>> The same trend holds for vdW terms:
>> >>>
>> >>> GAFF
>> >>>    c3          1.9080  0.1094             OPLS
>> >>> GAFF2
>> >>>    c3          1.9069  0.1078
>> >>> FF14SB (parm10)
>> >>>    CT          1.9080  0.1094             Spellmeyer
>> >>> LIPID14
>> >>>    cA          1.9080  0.1094             OPLS
>> >>>    cD          1.9080  0.1094             OPLS
>> >>>
>> >>>
>> >>> So it seems that if one is going to simulate some heterogenous  
>> system
>> >>> (e.g. protein + drug) it should be more suitable to use still old  
>> GAFF
>> >>> (not the new GAFF2) as GAFF is clearly more compatible
>> >>> with bio-force-fields than GAFF2 or am I wrong here ?
>> >>>
>> >>>
>> >>>    Best wishes,
>> >>>
>> >>>        Marek
>> >>>
>> >>>
>> >>>
>> >>> Dne Tue, 24 Jan 2017 19:44:34 +0100 David Case  
>> <david.case.rutgers.edu
>> >
>> >>> napsal/-a:
>> >>>
>> >>> > On Tue, Jan 24, 2017, Marek Maly wrote:
>> >>> >>
>> >>> >> I have technical question regarding QM methods used in GAFF2
>> >>> >> development.
>> >>> >
>> >>> > cc-ing to Junmei Wang, who doesn't always follow the mailing list.
>> >>> Note
>> >>> > that
>> >>> > you may need to wait for the paper on GAFF2 to be published...
>> >>> >
>> >>> > ...dac
>> >>> >
>> >>> >>
>> >>> >> In Amber16 manual, page 286, there is written:
>> >>> >>
>> >>> >>   "We have performed B3LYP/6-31G* optimization for 15 thousands
>> >>> >> marketed or
>> >>> >> experimental ..."
>> >>> >>
>> >>> >>
>> >>> >> So B3LYP/6-31G* is here QM method used to obtain optimized  
>> molecular
>> >>> >> structures (low energy
>> >>> >> conformations) from which equilibrium geometrical terms (eq. bond
>> >>> >> lengths,
>> >>> >> eq. angles) were obtained
>> >>> >> and so in some cases old GAFF values were updated.
>> >>> >>
>> >>> >>
>> >>> >> But in GAFF2, comparing to GAFF, changed clearly also some force
>> >>> >> constants.
>> >>> >> As an example I can mention c3-c3 bond:
>> >>> >>
>> >>> >>
>> >>> >> In GAFF
>> >>> >> c3-c3  300.9    1.5375          SOURCE1_SOURCE5   88072        
>> 0.0058
>> >>> >>
>> >>> >> In GAFF2
>> >>> >> c3-c3  232.52   1.538                SOURCE1_SOURCE5   88072   
>> 0.0058
>> >>> >>
>> >>> >>
>> >>> >> My question is, which QM methodology was used to derive (perhaps
>> >>> using
>> >>> >> paramfit ?) new GAFF2 bond, angle or dihedral force constants ?
>> >>> >>
>> >>> >> I assume, that since here we need just to calculate for each
>> >>> >> conformation
>> >>> >> only it's
>> >>> >>   energy (no optimization) and since we also need to obtain  
>> reliable
>> >>> >> force
>> >>> >> constants (hence we need accurate QM energies) some more accurate
>> >>> method
>> >>> >> was probably used like MP2/6-31G** , MP3/6-31G** etc. to derive  
>> QM
>> >>> >> energies. Am I right ?
>> >>> >>
>> >>> >> So thank you in advance for QM methodology specification used in
>> >>> GAFF2
>> >>> >> for
>> >>> >> force constants
>> >>> >> derivation.
>> >>> >>
>> >>> >>     Best wishes,
>> >>> >>
>> >>> >>         Marek
>> >>> >>
>> >>> >
>> >>> > _______________________________________________
>> >>> > AMBER mailing list
>> >>> > AMBER.ambermd.org
>> >>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >>> >
>> >>> > __________ Informace od ESET NOD32 Antivirus, verze virove  
>> databaze
>> >>> > 14822 (20170124) __________
>> >>> >
>> >>> > Tuto zpravu proveril ESET NOD32 Antivirus.
>> >>> >
>> >>> > http://www.eset.cz
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> >>> http://www.opera.com/mail/
>> >>>
>> >>> _______________________________________________
>> >>> AMBER mailing list
>> >>> AMBER.ambermd.org
>> >>> http://lists.ambermd.org/mailman/listinfo/amber
>> >>>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>>
>>
>> --
>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>> http://www.opera.com/mail/
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Feb 09 2017 - 18:30:02 PST
Custom Search