Hi Jason,
To follow up.
I have run the job in three ways on 4 frames
1) In serial (using 1 CPU)
2) In parallel (using 1 CPU - parallel meaning using MPI)
3) In parallel using 4 CPUs
Only the serial jobs (1) finishes the calculation, the other two output
the error:
>Error: Could not combine output files from different threads. Check
>output files for errors.
>NOTE: All files have been retained for debugging purposes. Type
>MMPBSA.py --clean to erase these files.
Similarly, the serial job is the only one to produce the "Timing:" steps
in the progress.log
The serial job produces _MMPBSA_receptor_rism.out, which contains 109
lines.
The 1 CPU "MPI" job produces the file _MMPBSA_receptor_rism.out.0 that
also contains 109 lines, but when it is then converts (combines) to
the _MMPBSA_receptor_rism.out it halts for some reason after 20 lines,
no errors are found in the files.
20 _MMPBSA_receptor_rism.out
109 _MMPBSA_receptor_rism.out.0
When I compare/diff the "serial" and the 1 CPU MPI files I get the
following... Interestingly, the results are not identical, although I
would have expected them to be.
< Processing ASCII trajectory: _MMPBSA_receptor.mdcrd.0
---
> Processing ASCII trajectory: _MMPBSA_receptor.mdcrd
89c89
< | Initialize 0.052
---
> | Initialize 0.051
95c95
< | Total 0.052
---
> | Total 0.051
102,103c102,103
< | pairlist 2.629
< | nonbond 2.124
---
> | pairlist 2.635
> | nonbond 2.121
106c106
< | 3D-RISM 13216.674
---
> | 3D-RISM 14058.368
109c109
< | Total 13221.447
---
> | Total 14063.144
Aside from this I don't see any errors in the files from the initial
attempt in
either _MMPBSA_complex_rism.out.0 or another other file.
Any idea what is going on? It could seem like a problem that occurs when
using MPI (OpenMPI), but I don't know what causes it.
>No. It copies over the normal ligand files (trajectories, topologies,
>etc.)
>and re-runs the calculations. This wasn't considered a huge deal since
>the
>mutation was almost always in the receptor and the ligand was so cheap
>to
>calculate.
This is true for small ligands, but with protein-protein complexes, this
does add unnecessary computation time for the exact same calculation.
Even more when using 3DRISM calculations. Maybe a fix for future
releases/updates?
Best,
Jesper
On Sep 24, 2011 16:29 "Jason Swails" <jason.swails.gmail.com>
<jason.swails.gmail.com>
<jason.swails.gmail.com> <jason.swails.gmail.com> wrote:
> On Fri, Sep 23, 2011 at 12:58 PM, Jesper Soerensen <lists.jsx.dk>
> <lists.jsx.dk>
> <lists.jsx.dk> <lists.jsx.dk> wrote:
>
> > Hi Jason,
> >
> >
> >
> >
> > Sorry it has taken me a while to get the bugfix patched and the
> > calculation tested. So to follow up I was able to patch it and it
> > moves
> > past this "bugged" point now. However, a new error surfaced:
> >
> >
> >
> >
> > > Error: Could not combine output files from different threads.
> > > Check
> > > output files for errors.
>
> Can you run in serial on a couple frames and see if it works? I'll try
> to
> see why it's not working... Can you look at _MMPBSA_complex_rism.out.0
> and
> see if there are any errors in it?
>
>
> > > NOTE: All files have been retained for debugging purposes. Type
> > > MMPBSA.py --clean to erase these files.
> >
> >
> >
> >
> > Now I checked the size of the combined output files:
> >
> > > 184K_MMPBSA_complex_rism.out
> >
> > > 184K_MMPBSA_ligand_rism.out
> >
> > > 184K_MMPBSA_receptor_rism.out
> >
> > > 4.0K_MMPBSA_mutant_complex_rism.out
> >
> > > 0_MMPBSA_mutant_ligand_rism.out
> >
> > > 0_MMPBSA_mutant_receptor_rism.out
> >
> >
> >
> >
> > And as you can see there is a problem, with some of them not having
> > the
> > expected size. I have checked and all the output files per node are
> > present and all have the same file size. Where should I look for the
> > error would you think?
> >
> >
> >
> >
> > Another thing that I wonder about is why it calculated the
> > mutant_ligand
> > "structure", when the mutation was performed on the enzyme, thus the
> > "wild type" ligand should be the same as the ligand one. Is that a
> > bug?
> >
>
> No. It copies over the normal ligand files (trajectories, topologies,
> etc.)
> and re-runs the calculations. This wasn't considered a huge deal since
> the
> mutation was almost always in the receptor and the ligand was so cheap
> to
> calculate.
>
> HTH,
> Jason
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Oct 05 2011 - 17:30:04 PDT