Re: [AMBER] MMPBSA.py 3DRISM error

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 6 Oct 2011 13:08:44 -0400

On Wed, Oct 5, 2011 at 8:18 PM, Jesper Soerensen <lists.jsx.dk> wrote:

> Hi Jason,
>
> To follow up.
> I have run the job in three ways on 4 frames
> 1) In serial (using 1 CPU)
>
> 2) In parallel (using 1 CPU - parallel meaning using MPI)
>
> 3) In parallel using 4 CPUs
>
>
>
>
> Only the serial jobs (1) finishes the calculation, the other two output
> the error:
>
> >Error: Could not combine output files from different threads. Check
> >output files for errors.
>
> >NOTE: All files have been retained for debugging purposes. Type
> >MMPBSA.py --clean to erase these files.
>
> Similarly, the serial job is the only one to produce the "Timing:" steps
> in the progress.log
>
>
>
>
>
>
> The serial job produces _MMPBSA_receptor_rism.out, which contains 109
> lines.
>
>
>
>
> The 1 CPU "MPI" job produces the file _MMPBSA_receptor_rism.out.0 that
> also contains 109 lines, but when it is then converts (combines) to
> the _MMPBSA_receptor_rism.out it halts for some reason after 20 lines,
> no errors are found in the files.
>
> 20 _MMPBSA_receptor_rism.out
>
> 109 _MMPBSA_receptor_rism.out.0
>
>
>
>
> When I compare/diff the "serial" and the 1 CPU MPI files I get the
> following... Interestingly, the results are not identical, although I
> would have expected them to be.
>
>
>
>
> < Processing ASCII trajectory: _MMPBSA_receptor.mdcrd.0
>
> ---
>
> > Processing ASCII trajectory: _MMPBSA_receptor.mdcrd
>
> 89c89
>
> < | Initialize 0.052
>
> ---
>
> > | Initialize 0.051
>
> 95c95
>
> < | Total 0.052
>
> ---
>
> > | Total 0.051
>
> 102,103c102,103
>
> < | pairlist 2.629
>
> < | nonbond 2.124
>
> ---
>
> > | pairlist 2.635
>
> > | nonbond 2.121
>
> 106c106
>
> < | 3D-RISM 13216.674
>
> ---
>
> > | 3D-RISM 14058.368
>
> 109c109
>
> < | Total 13221.447
>
> ---
>
> > | Total 14063.144
>
>
>
> Aside from this I don't see any errors in the files from the initial
> attempt in
>
> either _MMPBSA_complex_rism.out.0 or another other file.
>
> Any idea what is going on? It could seem like a problem that occurs when
> using MPI (OpenMPI), but I don't know what causes it.
>

I will look into it. This problem will no longer exist in the upcoming
version since it doesn't try to "combine" the output files (it just parses
through each one separately). The problem is that MMPBSA.py.MPI can't
figure out how to combine the output files correctly.


>
>
>
>
> >No. It copies over the normal ligand files (trajectories, topologies,
> >etc.)
> >and re-runs the calculations. This wasn't considered a huge deal since
> >the
> >mutation was almost always in the receptor and the ligand was so cheap
> >to
> >calculate.
>
> This is true for small ligands, but with protein-protein complexes, this
> does add unnecessary computation time for the exact same calculation.
> Even more when using 3DRISM calculations. Maybe a fix for future
> releases/updates?
>

This is a good idea. I'll add that in when I get a chance.

Thanks!
Jason


>
> Best,
> Jesper
>
>
>
>
>
>
> On Sep 24, 2011 16:29 "Jason Swails" <jason.swails.gmail.com>
> <jason.swails.gmail.com>
> <jason.swails.gmail.com> <jason.swails.gmail.com> wrote:
>
> > On Fri, Sep 23, 2011 at 12:58 PM, Jesper Soerensen <lists.jsx.dk>
> > <lists.jsx.dk>
> > <lists.jsx.dk> <lists.jsx.dk> wrote:
> >
> > > Hi Jason,
> > >
> > >
> > >
> > >
> > > Sorry it has taken me a while to get the bugfix patched and the
> > > calculation tested. So to follow up I was able to patch it and it
> > > moves
> > > past this "bugged" point now. However, a new error surfaced:
> > >
> > >
> > >
> > >
> > > > Error: Could not combine output files from different threads.
> > > > Check
> > > > output files for errors.
> >
> > Can you run in serial on a couple frames and see if it works? I'll try
> > to
> > see why it's not working... Can you look at _MMPBSA_complex_rism.out.0
> > and
> > see if there are any errors in it?
> >
> >
> > > > NOTE: All files have been retained for debugging purposes. Type
> > > > MMPBSA.py --clean to erase these files.
> > >
> > >
> > >
> > >
> > > Now I checked the size of the combined output files:
> > >
> > > > 184K_MMPBSA_complex_rism.out
> > >
> > > > 184K_MMPBSA_ligand_rism.out
> > >
> > > > 184K_MMPBSA_receptor_rism.out
> > >
> > > > 4.0K_MMPBSA_mutant_complex_rism.out
> > >
> > > > 0_MMPBSA_mutant_ligand_rism.out
> > >
> > > > 0_MMPBSA_mutant_receptor_rism.out
> > >
> > >
> > >
> > >
> > > And as you can see there is a problem, with some of them not having
> > > the
> > > expected size. I have checked and all the output files per node are
> > > present and all have the same file size. Where should I look for the
> > > error would you think?
> > >
> > >
> > >
> > >
> > > Another thing that I wonder about is why it calculated the
> > > mutant_ligand
> > > "structure", when the mutation was performed on the enzyme, thus the
> > > "wild type" ligand should be the same as the ligand one. Is that a
> > > bug?
> > >
> >
> > No. It copies over the normal ligand files (trajectories, topologies,
> > etc.)
> > and re-runs the calculations. This wasn't considered a huge deal since
> > the
> > mutation was almost always in the receptor and the ligand was so cheap
> > to
> > calculate.
> >
> > HTH,
> > Jason
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 06 2011 - 10:30:03 PDT
Custom Search