Great. Thanks for the reply. I'll be waiting for a new release.
One more thing I thought would be nice for an upcoming release would be
in the output (e.g. progress.log), that it would write timing
before/after each step in the calculation. This way it is possible to
follow the progress of a calculation a bit closer and gauge how much
time is needed, if submitting multiple jobs..
Like:
calculating complex contribution... Date/time
calculating receptor contribution... Date/time
calculating ligand contribution... Date/time
calculating complex contribution... Date/time
calculating receptor contribution... Date/time
calculating ligand contribution... Date/time
Just a small suggestion. Keep up the good work :-)
Best,
Jesper
On Oct 6, 2011 19:08 "Jason Swails" <jason.swails.gmail.com>
<jason.swails.gmail.com> wrote:
> On Wed, Oct 5, 2011 at 8:18 PM, Jesper Soerensen <lists.jsx.dk>
> <lists.jsx.dk> wrote:
>
> > Hi Jason,
> >
> > To follow up.
> > I have run the job in three ways on 4 frames
> > 1) In serial (using 1 CPU)
> >
> > 2) In parallel (using 1 CPU - parallel meaning using MPI)
> >
> > 3) In parallel using 4 CPUs
> >
> >
> >
> >
> > Only the serial jobs (1) finishes the calculation, the other two
> > output
> > the error:
> >
> > > Error: Could not combine output files from different threads.
> > > Check
> > > output files for errors.
> >
> > > NOTE: All files have been retained for debugging purposes. Type
> > > MMPBSA.py --clean to erase these files.
> >
> > Similarly, the serial job is the only one to produce the "Timing:"
> > steps
> > in the progress.log
> >
> >
> >
> >
> >
> >
> > The serial job produces _MMPBSA_receptor_rism.out, which contains
> > 109
> > lines.
> >
> >
> >
> >
> > The 1 CPU "MPI" job produces the file _MMPBSA_receptor_rism.out.0
> > that
> > also contains 109 lines, but when it is then converts (combines) to
> > the _MMPBSA_receptor_rism.out it halts for some reason after 20
> > lines,
> > no errors are found in the files.
> >
> > 20 _MMPBSA_receptor_rism.out
> >
> > 109 _MMPBSA_receptor_rism.out.0
> >
> >
> >
> >
> > When I compare/diff the "serial" and the 1 CPU MPI files I get the
> > following... Interestingly, the results are not identical, although
> > I
> > would have expected them to be.
> >
> >
> >
> >
> > < Processing ASCII trajectory: _MMPBSA_receptor.mdcrd.0
> >
> > ---
> >
> > > Processing ASCII trajectory: _MMPBSA_receptor.mdcrd
> >
> > 89c89
> >
> > < | Initialize 0.052
> >
> > ---
> >
> > > | Initialize 0.051
> >
> > 95c95
> >
> > < | Total 0.052
> >
> > ---
> >
> > > | Total 0.051
> >
> > 102,103c102,103
> >
> > < | pairlist 2.629
> >
> > < | nonbond 2.124
> >
> > ---
> >
> > > | pairlist 2.635
> >
> > > | nonbond 2.121
> >
> > 106c106
> >
> > < | 3D-RISM 13216.674
> >
> > ---
> >
> > > | 3D-RISM 14058.368
> >
> > 109c109
> >
> > < | Total 13221.447
> >
> > ---
> >
> > > | Total 14063.144
> >
> >
> >
> > Aside from this I don't see any errors in the files from the initial
> > attempt in
> >
> > either _MMPBSA_complex_rism.out.0 or another other file.
> >
> > Any idea what is going on? It could seem like a problem that occurs
> > when
> > using MPI (OpenMPI), but I don't know what causes it.
> >
>
> I will look into it. This problem will no longer exist in the upcoming
> version since it doesn't try to "combine" the output files (it just
> parses
> through each one separately). The problem is that MMPBSA.py.MPI can't
> figure out how to combine the output files correctly.
>
>
> >
> >
> >
> >
> > > No. It copies over the normal ligand files (trajectories,
> > > topologies,
> > > etc.)
> > > and re-runs the calculations. This wasn't considered a huge deal
> > > since
> > > the
> > > mutation was almost always in the receptor and the ligand was so
> > > cheap
> > > to
> > > calculate.
> >
> > This is true for small ligands, but with protein-protein complexes,
> > this
> > does add unnecessary computation time for the exact same
> > calculation.
> > Even more when using 3DRISM calculations. Maybe a fix for future
> > releases/updates?
> >
>
> This is a good idea. I'll add that in when I get a chance.
>
> Thanks!
> Jason
>
>
> >
> > Best,
> > Jesper
> >
> >
> >
> >
> >
> >
> > On Sep 24, 2011 16:29 "Jason Swails" <jason.swails.gmail.com>
> > <jason.swails.gmail.com>
> > <jason.swails.gmail.com> <jason.swails.gmail.com>
> > <jason.swails.gmail.com> <jason.swails.gmail.com>
> > <jason.swails.gmail.com> wrote:
> >
> > > On Fri, Sep 23, 2011 at 12:58 PM, Jesper Soerensen <lists.jsx.dk>
> > > <lists.jsx.dk>
> > > <lists.jsx.dk> <lists.jsx.dk>
> > > <lists.jsx.dk> <lists.jsx.dk> <lists.jsx.dk> wrote:
> > >
> > > > Hi Jason,
> > > >
> > > >
> > > >
> > > >
> > > > Sorry it has taken me a while to get the bugfix patched and the
> > > > calculation tested. So to follow up I was able to patch it and
> > > > it
> > > > moves
> > > > past this "bugged" point now. However, a new error surfaced:
> > > >
> > > >
> > > >
> > > >
> > > > > Error: Could not combine output files from different threads.
> > > > > Check
> > > > > output files for errors.
> > >
> > > Can you run in serial on a couple frames and see if it works? I'll
> > > try
> > > to
> > > see why it's not working... Can you look at
> > > _MMPBSA_complex_rism.out.0
> > > and
> > > see if there are any errors in it?
> > >
> > >
> > > > > NOTE: All files have been retained for debugging purposes.
> > > > > Type
> > > > > MMPBSA.py --clean to erase these files.
> > > >
> > > >
> > > >
> > > >
> > > > Now I checked the size of the combined output files:
> > > >
> > > > > 184K_MMPBSA_complex_rism.out
> > > >
> > > > > 184K_MMPBSA_ligand_rism.out
> > > >
> > > > > 184K_MMPBSA_receptor_rism.out
> > > >
> > > > > 4.0K_MMPBSA_mutant_complex_rism.out
> > > >
> > > > > 0_MMPBSA_mutant_ligand_rism.out
> > > >
> > > > > 0_MMPBSA_mutant_receptor_rism.out
> > > >
> > > >
> > > >
> > > >
> > > > And as you can see there is a problem, with some of them not
> > > > having
> > > > the
> > > > expected size. I have checked and all the output files per node
> > > > are
> > > > present and all have the same file size. Where should I look for
> > > > the
> > > > error would you think?
> > > >
> > > >
> > > >
> > > >
> > > > Another thing that I wonder about is why it calculated the
> > > > mutant_ligand
> > > > "structure", when the mutation was performed on the enzyme, thus
> > > > the
> > > > "wild type" ligand should be the same as the ligand one. Is that
> > > > a
> > > > bug?
> > > >
> > >
> > > No. It copies over the normal ligand files (trajectories,
> > > topologies,
> > > etc.)
> > > and re-runs the calculations. This wasn't considered a huge deal
> > > since
> > > the
> > > mutation was almost always in the receptor and the ligand was so
> > > cheap
> > > to
> > > calculate.
> > >
> > > HTH,
> > > Jason
> > >
> > _______________________________________________
> > AMBER mailing list
> > <AMBER.ambermd.org>
> > <http://lists.ambermd.org/mailman/listinfo/amber>
> >
>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 06 2011 - 11:00:05 PDT