Re: [AMBER] MMPBSA.py discrepancies in results from Bill Miller III on 2011-09-29 (Amber Archive Sep 2011)

From: Bill Miller III <brmilleriii.gmail.com>
Date: Thu, 29 Sep 2011 07:16:04 -0400

The energy calculation should certainly be more representative of your
structures sampled during your MD simulation if you include all snapshots
through the end frame of your trajectory for the MMPB(GB)SA calculation.

For the interval, remember that unless you were writing out every frame
during your MD simulation (setting ntwx=1), then you already do not have
every step included your MMPBSA.py input trajectory, which is why interval=1
for MMPBSA.py calculations is not unheard of. Personally, I think that a
minimum of a few hundred MD steps between each snapshot is probably the
closest together I would ever go. Peronally, I typically only write to my MD
netcdf every 1000 steps and then I still use inteval>1 for my MMPB(GB)SA
analysis.

I hope that helps.

-Bill

On Thu, Sep 29, 2011 at 5:31 AM, George Tzotzos <gtzotzos.me.com> wrote:

> Jason,
>
> Many thanks once again. Allow me to bother you one last time with this.
>
> Clearly taking only the 50 first frames of a trajectory consisting of many
> more snapshots does not make sense. The energy calculation would be more
> precise if endframe were set to be the last snapshot of the trajectory. Am I
> correct in this?
>
> I'm also reading in the manual that MMPB(GB)SA "is most effective when the
> structures are not correlated, which means that the simulated time between
> extracted snapshots should be sufficiently large to avoid such correlation".
>
> On the basis of this shouldn't one set the interval to a value > 1? If so,
> is there a rule of thumb as to what this value should be?
>
> All the best
>
> George
>
> On Sep 29, 2011, at 12:22 AM, Jason Swails wrote:
>
> > On Wed, Sep 28, 2011 at 5:43 PM, George Tzotzos <gtzotzos.me.com> wrote:
> >
> >> Hi Jason,
> >>
> >> Many thanks for this. Indeed the unstripped method reads 500 frames
> while
> >> the stripped one only 50. I'll increase the number of readings of the
> >> stripped trajectory to 500 and see if the values converge. I'm sure they
> >> will.
> >>
> >
> > You have to be careful of this as well, though, since that approach will
> > extract different frames. Let's assume that each small trajectory has
> 1000
> > frames. By setting endframe=500 for the stripped coordinates, what
> you'll
> > be doing is extracting the first 500 frames of the first trajectory
> (since
> > that will be the first 500 frames of the concatenated trajectory). For
> the
> > unstripped version, on the other hand, you'll still be taking the first
> 50
> > frames of the first trajectory, the first 50 frames of the second
> > trajectory, etc. In the concatenated trajectory, this is equivalent to
> > taking frames 1-50, 1001-1050, 2001-2050, etc.
> >
> > The only easy way of doing this comparison is combining all of the
> > trajectories into a single unstripped trajectory and then using identical
> > inputs (but using that combined trajectory instead of all the smaller
> > ones). Note that MMPBSA.py version 3 (upcoming) will treat all
> trajectories
> > like a single, continuous trajectory.
> >
> >
> >> However, this brings me to the next question. One of the reasons that I
> >> stripped the 20ns trajectory is because I wanted to generate clusters
> >> through prtaj/cluster. Furthermore, my intention is to work out the
> Delta Gs
> >> of each individual cluster. I did so by using the same MMPBSA.py input
> file
> >> (the one that I mailed earlier). The problem I see here is that each
> cluster
> >> has different number of frames. Here's an example:
> >>
> >> #Cluster 0: has 645 points, occurence 0.323
> >> #Cluster 1: has 826 points, occurence 0.413
> >> #Cluster 2: has 50 points, occurence 0.025
> >> #Cluster 3: has 270 points, occurence 0.135
> >> #Cluster 4: has 209 points, occurence 0.104
> >>
> >> Clearly endframe=50 is not a good idea in this case, particularly for
> those
> >> clusters that are the least populated. I'd be grateful for any ideas how
> to
> >> handle this.
> >>
> >
> > Why not just dump each cluster into a separate trajectory and analyze
> every
> > frame in a separate MMPBSA calculation?
> >
> > HTH,
> > Jason
> >
> >
> >> Have a good day
> >>
> >> George
> >>
> >> On Sep 28, 2011, at 8:43 PM, Jason Swails wrote:
> >>
> >>> Hey,
> >>>
> >>> I know what's happening here. I would definitely suggest upgrading to
> >> the
> >>> AmberTools 1.5 version. However, the behavior that you're *expecting*
> >> isn't
> >>> quite what you're getting. By setting endframe=50, you may think
> you're
> >>> getting only 50 frames (and indeed, for the stripped system you are
> since
> >>> you have only a single trajectory). However, when you load all of the
> >> mdcrd
> >>> files in for the unstripped case, what MMPBSA is doing (both in the
> >>> AmberTools 1.4 and 1.5 versions) is loading the first 50 frames of
> *each*
> >>> trajectory. That is, MMPBSA.py is running the commands
> >>>
> >>> trajin prod_2ns.mdcrd 1 50 1
> >>> trajin prod_4ns.mdcrd 1 50 1
> >>> trajin prod_6ns.mdcrd 1 50 1
> >>> ...
> >>> trajin prod_20ns.mdcrd 1 50 1
> >>>
> >>> For the stripped case, you have only a single trajectory, so the ptraj
> >>> commands run by MMPBSA.py is
> >>>
> >>> trajin prod_2_20ns.mdcrd 1 50 1
> >>>
> >>> As you can see, the frames analyzed by each method are completely
> >>> different. In fact, the unstripped case is running far more frames
> than
> >> the
> >>> stripped case (as evidenced by the fact that the standard error of the
> >> mean,
> >>> which is the std. deviation / sqrt(number of frames) is significantly
> >>> smaller for unstripped case). If you want to get something that's
> truly
> >>> comparative, you have to make sure that the same frames are used in
> each
> >>> calculation. Also, the final output file should tell you how many
> frames
> >>> were used in each calculation, and you should see a difference there.
> >>>
> >>> As an aside, MMPBSA.py took the easy way out at first by just applying
> >> the
> >>> same startframe, endframe, and interval to each trajectory, rather than
> >>> figuring out how many frames are in each trajectory and adjusting the
> >>> individual startframe, endframe, and interval such that you pull *only*
> >> the
> >>> specified frames from the fully concatenated trajectory. (I believe
> this
> >> is
> >>> described in the manual, but I'm not sure). The upcoming version of
> >>> MMPBSA.py has been reworked in many ways, and will behave as you were
> >>> expecting this version to behave (and the manual will be updated to
> >> reflect
> >>> as much).
> >>>
> >>> HTH,
> >>> Jason
> >>>
> >>> On Wed, Sep 28, 2011 at 12:04 PM, George Tzotzos <gtzotzos.me.com>
> >> wrote:
> >>>
> >>>> I'm doing so right now. Will report accordingly
> >>>>
> >>>> Thanks
> >>>>
> >>>> George
> >>>>
> >>>> On Sep 28, 2011, at 6:03 PM, Bill Miller III wrote:
> >>>>
> >>>>> Try upgrading to AmberTools 1.5 and see if you get the same issues.
> >>>>>
> >>>>> -Bill
> >>>>>
> >>>>> On Wed, Sep 28, 2011 at 11:44 AM, George Tzotzos <gtzotzos.me.com>
> >>>> wrote:
> >>>>>
> >>>>>> No, I'm using the AmberTools 1.4 version
> >>>>>>
> >>>>>>
> >>>>>> On Sep 28, 2011, at 5:23 PM, Jason Swails wrote:
> >>>>>>
> >>>>>>> What version of MMPBSA.py are you using? Are you using the version
> >>>> from
> >>>>>>> AmberTools 1.5?
> >>>>>>>
> >>>>>>> On Wed, Sep 28, 2011 at 10:43 AM, George Tzotzos <gtzotzos.me.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Jason
> >>>>>>>>
> >>>>>>>> stripping water/ions and merging the trajectory files was done
> with
> >>>> the
> >>>>>>>> following input file
> >>>>>>>>
> >>>>>>>> trajin prod_2ns.mdcrd
> >>>>>>>> .................
> >>>>>>>> trajin prod_20ns.mdcrd
> >>>>>>>> strip :WAT
> >>>>>>>> strip :Na+
> >>>>>>>> trajout prod_0-20ns.mdcrd netcdf
> >>>>>>>>
> >>>>>>>> As you can see, netcdf is included
> >>>>>>>>
> >>>>>>>> I don't see any great discrepancy in the GB results (see below)
> >>>>>>>>
> >>>>>>>> Trajectories stripped of water/ions via the above method submitted
> >> to
> >>>>>>>> MMPBSA.py give
> >>>>>>>> Generalized Born
> >>>>>>>> DELTA G binding = -33.4702 +/- 1.8177
> >>>>>>>> 0.2571
> >>>>>>>> Poisson Boltzmann
> >>>>>>>> DELTA G binding = -26.5181 +/- 2.6507
> >>>>>>>> 0.3749
> >>>>>>>>
> >>>>>>>> Trajectories NOT stripped of water/ions PRIOR to being submitted
> to
> >>>>>>>> MMPBSA.py
> >>>>>>>> Generalized Born
> >>>>>>>> DELTA G binding = -34.3190 +/- 2.0987
> >>>>>>>> 0.0939
> >>>>>>>> Poisson Boltzmann
> >>>>>>>> DELTA G binding = -28.8956 +/- 2.8508
> >>>>>>>> 0.1275
> >>>>>>>>
> >>>>>>>> I've tried to re-run MMPBSA.py including netcdf=1 as per your
> >>>> suggestion
> >>>>>>>>
> >>>>>>>> Input file for running PB and GB
> >>>>>>>> &general
> >>>>>>>> strip_mdcrd=0
> >>>>>>>> endframe=50, verbose=1,
> >>>>>>>> netcdf=1,
> >>>>>>>> # entropy=1,
> >>>>>>>> /
> >>>>>>>> &gb
> >>>>>>>> igb=2, saltcon=0.100
> >>>>>>>> /
> >>>>>>>> &pb
> >>>>>>>> istrng=0.100,
> >>>>>>>> /
> >>>>>>>>
> >>>>>>>> I get "Warning: Input error! "netcdf=1" is an invalid option."
> >>>>>>>>
> >>>>>>>> I also tried it at &gb and &pb. MMPBSA.py issues the same error
> >>>> message
> >>>>>> as
> >>>>>>>> above.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sep 28, 2011, at 4:16 PM, Jason Swails wrote:
> >>>>>>>>
> >>>>>>>>> When you stripped all waters and everything else, did you do
> >> anything
> >>>>>>>> else?
> >>>>>>>>> (For instance, did you align via RMSD, etc.)?
> >>>>>>>>>
> >>>>>>>>> Here is my suggestion. Work *completely* in NetCDF trajectories.
> >> I
> >>>>>>>> don't
> >>>>>>>>> know if your initial trajectories are NetCDF (I would strongly
> >>>>>> encourage
> >>>>>>>> you
> >>>>>>>>> to use NetCDF if you're not). Make sure the stripped trajectory
> >> that
> >>>>>> you
> >>>>>>>>> make is created as a NetCDF trajectory. Then, use netcdf=1 in
> your
> >>>>>>>>> MMPBSA.py input file. This will force MMPBSA.py to use NetCDF
> for
> >>>> the
> >>>>>>>>> temporary internal files.
> >>>>>>>>>
> >>>>>>>>> I'd be curious to see if the (vastly) increased precision of a
> >> NetCDF
> >>>>>>>>> trajectory can account for some of this difference...
> >>>>>>>>>
> >>>>>>>>> Also, you never say whether the discrepancy arises in the PB
> >> results
> >>>> or
> >>>>>>>> GB
> >>>>>>>>> results (I would expect the PB results, yes?).
> >>>>>>>>>
> >>>>>>>>> All the best,
> >>>>>>>>> Jason
> >>>>>>>>>
> >>>>>>>>> On Wed, Sep 28, 2011 at 9:33 AM, George Tzotzos <gtzotzos.me.com
> >
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Bill
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the prompt reply
> >>>>>>>>>>
> >>>>>>>>>> Run1: mmpbsa.in
> >>>>>>>>>>
> >>>>>>>>>> Input file for running PB and GB
> >>>>>>>>>> &general
> >>>>>>>>>> endframe=50, verbose=1,
> >>>>>>>>>> # entropy=1,
> >>>>>>>>>> /
> >>>>>>>>>> &gb
> >>>>>>>>>> igb=2, saltcon=0.100
> >>>>>>>>>> /
> >>>>>>>>>> &pb
> >>>>>>>>>> istrng=0.100,
> >>>>>>>>>> /
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Run 2: mmpbsa.in
> >>>>>>>>>>
> >>>>>>>>>> Input file for running PB and GB
> >>>>>>>>>> &general
> >>>>>>>>>> strip_mdcrd=0
> >>>>>>>>>> endframe=50, verbose=1,
> >>>>>>>>>> # entropy=1,
> >>>>>>>>>> /
> >>>>>>>>>> &gb
> >>>>>>>>>> igb=2, saltcon=0.100
> >>>>>>>>>> /
> >>>>>>>>>> &pb
> >>>>>>>>>> istrng=0.100,
> >>>>>>>>>> /
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Sep 28, 2011, at 3:27 PM, Bill Miller III wrote:
> >>>>>>>>>>
> >>>>>>>>>>> What does your mmpbsa.in file look like? Are you calculating
> the
> >>>>>>>> binding
> >>>>>>>>>>> energy for all frames or only a select number of frames? If you
> >> are
> >>>>>>>> only
> >>>>>>>>>>> using a select number of frames I would bet that the two
> methods
> >>>>>>>> selected
> >>>>>>>>>>> different frames to evaluate and that resulted in the
> discrepancy
> >>>>>> that
> >>>>>>>>>> you
> >>>>>>>>>>> are reporting here.
> >>>>>>>>>>>
> >>>>>>>>>>> -Bill
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Sep 28, 2011 at 9:20 AM, George Tzotzos <
> gtzotzos.me.com
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi everybody,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd appreciate any comments / help on the discrepancies shown
> >>>> below.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I run MMPBSA.py on the same 20ns trajectory twice.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. Run 1
> >>>>>>>>>>>>
> >>>>>>>>>>>> mpirun -np 12 MMPBSA.py.MPI -O -i mmpbsa.in -o RESULTS.dat
> -sp
> >>>>>>>>>>>> complex_solv.prmtop -cp complex.prmtop -rp receptor.prmtop -lp
> >>>>>>>>>> ligand.prmtop
> >>>>>>>>>>>> -y *.mdcrd
> >>>>>>>>>>>>
> >>>>>>>>>>>> *.mdcrd represents 10x2ns trajectories
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Run 2
> >>>>>>>>>>>>
> >>>>>>>>>>>> Step 1. merge the 2ns trajectories into a 20ns.mdcrd while
> >>>> stripping
> >>>>>>>>>> waters
> >>>>>>>>>>>> and ions
> >>>>>>>>>>>>
> >>>>>>>>>>>> Step 2. generated new topology files for complex, receptor and
> >>>>>> ligand
> >>>>>>>>>>>>
> >>>>>>>>>>>> run MMPBSA.py
> >>>>>>>>>>>>
> >>>>>>>>>>>> mpirun -np 12 MMPBSA.py.MPI -O -i mmpbsa.in -o RESULTS.dat
> -cp
> >>>>>>>>>>>> comp_new.prmtop -rp rec_new.prmtop -lp lig_new.prmtop -y
> *.mdcrd
> >>>>>>>>>>>>
> >>>>>>>>>>>> Result of Run1.
> >>>>>>>>>>>>
> >>>>>>>>>>>> DELTA G binding = -28.8956 +/- 2.8508
> >>>>>>>>>>>> 0.1275
> >>>>>>>>>>>>
> >>>>>>>>>>>> Result of Run2
> >>>>>>>>>>>>
> >>>>>>>>>>>> DELTA G binding = -26.5181 +/- 2.6507
> >>>>>>>>>>>> 0.3749
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> AMBER mailing list
> >>>>>>>>>>>> AMBER.ambermd.org
> >>>>>>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Bill Miller III
> >>>>>>>>>>> Quantum Theory Project,
> >>>>>>>>>>> University of Florida
> >>>>>>>>>>> Ph.D. Graduate Student
> >>>>>>>>>>> 352-392-6715
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> AMBER mailing list
> >>>>>>>>>>> AMBER.ambermd.org
> >>>>>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> AMBER mailing list
> >>>>>>>>>> AMBER.ambermd.org
> >>>>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Jason M. Swails
> >>>>>>>>> Quantum Theory Project,
> >>>>>>>>> University of Florida
> >>>>>>>>> Ph.D. Candidate
> >>>>>>>>> 352-392-4032
> >>>>>>>>> _______________________________________________
> >>>>>>>>> AMBER mailing list
> >>>>>>>>> AMBER.ambermd.org
> >>>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> AMBER mailing list
> >>>>>>>> AMBER.ambermd.org
> >>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jason M. Swails
> >>>>>>> Quantum Theory Project,
> >>>>>>> University of Florida
> >>>>>>> Ph.D. Candidate
> >>>>>>> 352-392-4032
> >>>>>>> _______________________________________________
> >>>>>>> AMBER mailing list
> >>>>>>> AMBER.ambermd.org
> >>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> AMBER mailing list
> >>>>>> AMBER.ambermd.org
> >>>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Bill Miller III
> >>>>> Quantum Theory Project,
> >>>>> University of Florida
> >>>>> Ph.D. Graduate Student
> >>>>> 352-392-6715
> >>>>> _______________________________________________
> >>>>> AMBER mailing list
> >>>>> AMBER.ambermd.org
> >>>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> AMBER mailing list
> >>>> AMBER.ambermd.org
> >>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Jason M. Swails
> >>> Quantum Theory Project,
> >>> University of Florida
> >>> Ph.D. Candidate
> >>> 352-392-4032
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> > --
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Candidate
> > 352-392-4032
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Bill Miller III
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-6715
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Sep 29 2011 - 04:30:03 PDT