Re: [AMBER] MMPBSA.py: error during parsing of _MMPBSA_complex_gb.mdout

From: Jan-Philip Gehrcke <jgehrcke.googlemail.com>
Date: Wed, 26 Oct 2011 11:00:58 +0200

Thanks. I've a few comments left:

1) In the official AmberTools manual
(http://ambermd.org/doc11/AmberTools.pdf), I believe it is not stated at
all that for decomposition, sander 11 is required.

2) However, I've once saved a manual "MMPBSA_Python_Manual.pdf" (I
cannot find from where I've downloaded it). There it is stated "&decomp
namelist variables (available only for Amber 11)". This would have made
me suspicious, but "must not be used with an Amber version smaller than
11" is unambiguous. Not "available" sounds like an error should be
printed if I still try to use it.

3) During our discussion in this thread, you, Jason, have mentioned "the
gbsa=2 code has known issues (at least 1 bug)". Is this worrying for
me/us, even if I will use sander 11?

4) Doesn't sander have something comparable to a "--version" switch?
Anything that could be used to certainly identify the code version? A
check based on this and a meaningful error message would be a clean
solution.

5) We've "already" ordered Amber 11 at beginning of October and got a
mail on Monday that we'll receive download instructions within a week.
Now, I suddenly cannot wait to run MMPBSA.py using sander 11 :-) What's
happening within that week?

6) Now that we've (you've) found the problem, I think there will be one
more default question on the list for people complaining about
ValueErrors / asterisks in mdout files / high energies in the context of
MMPBSA.py decomposition :-)


Big thanks again!

Jan-Philip



On 10/25/2011 06:04 PM, Jason Swails wrote:
> Ohhhhh... I see what's happening here. The difference between the python
> version and the perl version is that Python uses imin=5 and the trajectory
> approach, whereas the Perl version creates a restart file for every single
> snapshot and runs them one at a time (rather less efficient). However, the
> decomposition code using imin=5 doesn't work for Amber 10, since the
> variables are not actually reset after every frame, so they just accumulate.
> You should actually see the values converge properly for the Python version
> if you use as many processors as you have frames (so that each thread only
> does one frame), but that's not really a desirable approach.
>
> I recall looking through the code to fix these issues awhile back, but I
> found the code quite difficult to parse through at the time (I think it's
> convoluted code to begin with, and I was far less familiar with sander code
> at the time). It was fixed shortly thereafter, but never ported back to
> Amber 10. For that reason, the first couple MMPBSA.py releases checked to
> make sure that any sander binary being used was *not* located in a path
> containing "amber10" to avoid these issues. This was removed in favor of
> looking *only* in AMBERHOME/bin for binaries (which really shouldn't be
> changed, since there's nothing sander10 can do that mmpbsa_py_energy can't).
>
> Long story short -- you need Amber 11 to do decomp with MMPBSA.py.
>
> HTH,
> Jason
>
> On Tue, Oct 25, 2011 at 11:30 AM, Jan-Philip Gehrcke<
> jgehrcke.googlemail.com> wrote:
>
>> Hey,
>>
>> please see below for further comments on this issue. Especially, we've
>> done a comparison between Python and Perl version of MMPBSA on the same
>> molecular system. Although both are using sander10, the decomposition
>> energetical values printed by sander10 are very different from Python to
>> Perl version.
>>
>> On 10/18/2011 08:08 PM, Jason Swails wrote:
>>> On Tue, Oct 18, 2011 at 1:03 PM, Jan-Philip Gehrcke<
>> jgehrcke.googlemail.com
>>>> wrote:
>>>
>>>> Sorry for the late reply. I have some more details regarding this issue:
>>>>
>>>> 1) weird: parsing error depends on MPI process count
>>>> ====================================================
>>>> An MMPBSA.py.MPI run using `mpirun -np 8` on 101 frames succeeds
>>>> (executed locally on my 8 core machine). An MMPBSA.py.MPI run using
>>>> `mpirun -np 2` using the *same files* with the *same input
>>>> configuration* on the same machine fails with the parsing error
>>>> indicated in the opening mail (due to "*********" tokens in the
>>>> _MMPBSA_ligand_gb.mdout).
>>>>
>>>> Let me provide more detail on the differences between the two
>>>> _MMPBSA_ligand_gb.mdout files created by these two independent runs:
>>>>
>>>>
>>>> - The "crashtest1/_MMPBSA_ligand_gb.mdout" corresponds to the -np 2 run.
>>>> - The file "crashtest3/_MMPBSA_ligand_gb.mdout" corresponds to the -np 8
>>>> run.
>>>>
>>>> First of all, I checked the number of lines containing "***" in both
>> files:
>>>>
>>>> $ cat crashtest1/_MMPBSA_ligand_gb.mdout | grep "\*\*\*" | wc -l
>>>> 297
>>>>
>>>
>>> Which lines were these on (specifically, which terms were blowing up)?
>> My
>>> suggestion here is to visualize the trajectories _MMPBSA_ligand.mdcrd(.x)
>>> with your ligand topology file to make sure that these are created
>> properly.
>>>
>> For the file mentioned above, these are the "blowup counts" for all the
>> data columns:
>>
>> internal: 297
>> vdw: 0
>> eel: 18
>> pol: 0
>> sas: 0
>>
>> I visualized the _MMPBSA_ligand.mdcrd using the ligand topology file and
>> everything looked fine.
>>
>>
>>>
>>>>
>>>> $ cat crashtest3/_MMPBSA_ligand_gb.mdout | grep "\*\*\*" | wc -l
>>>> 0
>>>>
>>>>
>>>> Then, I took "crashtest1/_MMPBSA_ligand_gb.mdout" (from the -np 2 run)
>>>> as reference. In this file, I grepped all lines starting with either
>>>> VDWAALS, 1-4, BOND, or ESURF. For each grepped line, I checked in the
>>>> "crashtest3/_MMPBSA_ligand_gb.mdout" (the -np 8 run) if an equivalent
>>>> line is existing. If it exists, it is a match and counted:
>>>>
>>>> $ cat crashtest1/_MMPBSA_ligand_gb.mdout | grep VDWAALS | while read
>>>> LINE; do grep "${LINE}" crashtest3/_MMPBSA_ligand_gb.mdout; done | wc -l
>>>> 101
>>>>
>>>> $ cat crashtest1/_MMPBSA_ligand_gb.mdout | grep 1-4 | while read LINE;
>>>> do grep "${LINE}" crashtest3/_MMPBSA_ligand_gb.mdout; done | wc -l
>>>> 101
>>>>
>>>> $ cat crashtest1/_MMPBSA_ligand_gb.mdout | grep BOND | while read LINE;
>>>> do grep "${LINE}" crashtest3/_MMPBSA_ligand_gb.mdout; done | wc -l
>>>> 101
>>>>
>>>> $ cat crashtest1/_MMPBSA_ligand_gb.mdout | grep ESURF | while read
>>>> LINE; do grep "${LINE}" crashtest3/_MMPBSA_ligand_gb.mdout; done | wc -l
>>>> 80
>>>>
>>>> I believe that the runs should not differ in their results so that each
>>>> grepped line from the reference file should find a matching line in the
>>>> comparison file. As 101 frames have been analyzed by MMPBSA, the 101
>>>> should be the desired output, right?
>>>>
>>>> However, the matching above shows that the ESURF results differ
>>>> significantly in 21 of 101 cases only due to a change from -np 2 to -np
>>>> 8. How can this be explained?
>>>>
>>>
>>> I'll preface this by saying that decomposition uses gbsa=2 and all other
>> GB
>>> calculations use gbsa=1 (LCPO).
>>>
>>> Are you using decomposition?
>>
>> Yes, sorry that I did not provide this information in the first place.
>>
>>> If so, the gbsa=2 code has known issues (at
>>> least 1 bug). If you run a 1-step minimization using imin=1 (so it's
>> just
>>> an energy extraction) using gbsa==2, the ESURF energy printed out in the
>>> first step differs from the ESURF energy printed out in the summary,
>> which
>>> is indicative of an issue. All other energies are identical, as you
>> would
>>> expect. This was raised in a thread within the last year or so, but I'm
>> too
>>> lazy to go after it now... I'm not exactly sure what causes this issue,
>> but
>>> I'm not keen on hunting through the (sander) code looking for it.
>>
>> Hence, we do not know where the problem is exactly. But I would like to
>> ask: Is this just a guess or is it obvious for you that the bug you are
>> describing leads to the dependency of resulting numerical values on the
>> number of MPI processes?
>>
>>>
>>>
>>>> I performed both tests three times and reproduced the result.
>>>>
>>>>
>>>> 2) The initial trajectory
>>>> =========================
>>>> The initial trajectory (from which the frames are extracted) looks fine
>>>> in VMD. Also, the RMSD of the whole complex compared to the initial
>>>> structure over time is quite stable.
>>>>
>>>> Jason, you wrote "Try visualizing the created trajectories with VMD or
>>>> something to make sure that they look OK". So you mean I should also
>>>> visualize the extracted frames, yes? Can't we rely on ptraj that the
>>>> extraction went well? How would I check if "they look OK"?
>>>>
>>>
>>> Extracted frames for the complex, receptor, and ligand, yes. By "look
>> OK" I
>>> mean they don't look completely warped and ridiculous (caused by the fact
>>> that the atoms in the trajectory don't correspond to the atoms in the
>>> associated topology file).
>>>
>>>
>>>>
>>>>
>>>> 3) Various other frame count / -np X combinations
>>>> =================================================
>>>> I've performed some more tests under constant conditions. All MMPBSA
>>>> runs are based on the same trajectory file. Let me list the test results
>>>> in dependence on the X in `-np X` and on the `interval` variable in the
>>>> `&general` block of the MMPBSA input file:
>>>>
>>>> - 11 Frames / np 2: success
>>>> - 101 Frames / np 2: parsing error
>>>>
>>>> - 11 Frames / np 8: success
>>>> - 101 Frames / np 8: success
>>>> - 10001 Frames / np 8: parsing error
>>>>
>>>>
>>>> Hence, just using "-np 8" is not a solution. I/we have to understand
>>>> what is going on here. What can I do now? Thanks for your help!
>>>>
>>>
>>> Try using NetCDF trajectories as intermediates (this won't work for
>> Alanine
>>> scanning, but it will for everything else). You can do this via the
>>> "netcdf=1" variable in the MM/PBSA input file. Are your original
>>> trajectories NetCDF?
>>
>> My original trajectory is saved in NetCDF format. I did not use
>> intermediate NetCDF trajectories so far, but I will test that.
>>
>>
>> Comparison Perl/Python version
>> ==============================
>> A colleague of mine who has some experience in using the Perl version of
>> MMPBSA ran the analysis with equivalent input files and same
>> (corresponding) settings, i.e.
>>
>> 1) From my original NetCDF trajectory, I extracted 101 ascii frames.
>> 2) Using this trajectory, I ran MMPBSA (Python, non-MPI) with the input
>>
>> $ cat mmpbsa_decomp.in
>> &general
>> receptor_mask = :1-205,
>> ligand_mask = :206-355,
>> /
>> &gb
>> igb = 5,
>> /
>> &decomp
>> idecomp = 1,
>> dec_verbose = 3,
>> /
>>
>> resulting in the following blowup line count:
>>
>> internal: 1030
>> vdw: 0
>> eel: 615
>> pol: 34
>> sas: 0
>>
>> The sander version used by this MMPBSA run was from Amber 10.
>>
>> 3) My colleague ran the Perl version of MMPBSA, also with sander from
>> Amber 10 and input settings corresponding to the input file above. All
>> data input files were the *same* as in the Python MMPBSA run.
>>
>>
>>
>> During the Perl MMPBSA run, an error was printed for only one of the 101
>> frames. Overall, the analysis apparently was performed properly.
>>
>> All energetical values written by sander10 in the context of Python
>> MMPBSA are very different from the energetical values written by
>> sander10 in the context of Perl MMPBSA. The values produced "by Python
>> MMPBSA" are much too big. I've analyzed the output files:
>>
>> - snapshots_lig.all.out (Perl version)
>> - _MMPBSA_ligand_gb.mdout (Python version)
>>
>> All lines starting with "TDC" have been extracted, providing the columns
>> carrying the different energy contributions. For each column, all values
>> that have not "blown up" have been averaged (hence, averaged over
>> residues and frames). The result is now presented in the following form:
>>
>> NAME (VALUE COUNT): MEAN OF ABSVALUE +- STDDEV OF ABSVALUE
>>
>> Perl version:
>> internal (15150): 22.9 +- 6.0
>> vdw (15150): 4.4 +- 2.4
>> eel (15150): 36.8 +- 39.4
>> pol (15150): 16.1 +- 23.3
>> sas (15150): 59.1 +- 54.6
>>
>> Python version:
>> internal (14714): 4226 +- 2890
>> vdw (15150): 417 +- 292
>> eel (14642): 3962 +- 2439
>> pol (15135): 746 +- 1312
>> sas (15150): 3007 +- 3563
>>
>>
>> If we've not done any mistake during our comparison, this tells us that
>> Python and Perl version use sander in a different way, while the "Perl
>> way" seems to be the more correct one... here you have to weigh in!
>>
>>
>> Thanks for your help,
>>
>> Jan-Philip
>>
>>>
>>> All the best,
>>> Jason
>>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Oct 26 2011 - 02:30:02 PDT
Custom Search