Re: [AMBER] MMPBSA.py.MPI problems: L-J-parameters, bad file descriptors, MMPBSA hangup due to sander failure from Jason Swails on 2011-10-05 (Amber Archive Oct 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 5 Oct 2011 10:40:45 -0400

On Wed, Oct 5, 2011 at 6:54 AM, Jan-Philip Gehrcke
<jgehrcke.googlemail.com>wrote:

> Hey,
>
> for the first time I have been trying to run MMPBSA.py.MPI (from AT 1.5,
> together with sander 10). For this test case, I used the settings
>
> &general
> startframe = 1000,
> endframe = 1020,
> interval = 5,
> receptor_mask = :107-311,
> ligand_mask = :1-106,312-355
> /
> &gb
> igb = 5,
> /
> &decomp
> idecomp = 1,
> dec_verbose = 3
> /
>
>
> and ran into some problems indicated below (comments inbetween):
>
> 1) first run
> ============
>
> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
> ligand.prmtop -y ../md_equilibrate_00*
>
> Running MMPBSA.MPI on 4 processors...
> Reading command-line arguments and input files...
> Loading and checking parameter files for compatibility...
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
> Warning: Decomposition is only automated if I guess the ligand and receptor
> masks! I will write skeleton mdin files which you must edit. Re-run
> MMPBSA.py with -use-mdins, or allow me to guess the masks.
> Warning: Problem parsing L-J 6-12 parameters.
>
>
>
> * Comment:
> What does the warning regarding the Lennard Jones parameters mean?
>

It means that there was some problem figuring out what the Lennard Jones
parameters were (it had problems parsing the LENNARD_JONES_A/BCOEF arrays).
That except is catching Exception, which may be masking something. Go to
$AMBERHOME/AmberTools/src/etc/chemistry/amber/readparm.py and drop the
except clause to see the traceback. This worries me though and makes me
think something may be wrong with your topology files. Can you send me them

Note that the new version will know how to map receptor and ligand residues
based on the receptor_mask and ligand_mask that you provide, so this step
will become unnecessary. (If your system is a protein-protein complex, try
switching the definition of your receptor and ligand and MMPBSA.py will be
able to figure out the masks for you).

>
> 2) second run (with -use-mdins)
> ===============================
>
> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>
> Running MMPBSA.MPI on 4 processors...
> Reading command-line arguments and input files...
> Loading and checking parameter files for compatibility...
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
> Preparing trajectories for simulation...
> 20 frames were read in and processed by ptraj for use in calculation.
>
> Beginning GB calculations with sander...
> calculating complex contribution...
> close failed in file object destructor:
> IOError: [Errno 9] Bad file descriptor
> close failed in file object destructor:
> IOError: [Errno 9] Bad file descriptor
> close failed in file object destructor:
> IOError: [Errno 9] Bad file descriptor
>
>
> * Comment:
> 3 Python IOErrors. But, unfortunately, without the full Python
> Traceback. Why is it not printed here? Maybe due to a too catchy
> try/except block? Close attempts fail three times, because the file
> descriptors are already invalid. Looks one of the four MPI processes
> wins, closes the file(s) and the other three fail...
>

No, this version of MMPBSA actually doesn't make much use of try-except
blocks (it works with return codes instead). The problem is that all of
this occurs inside an os.system call (as you point out below), so any
problem that crops up inside the spawned process simply returns a non-zero
value to the os.system function call. It's os.system that's masking issues
here, not over-generalized except clauses.

>
> 3) third run (directly after the second, with the same arguments)
> =================================================================
>
> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>
> Running MMPBSA.MPI on 4 processors...
> Reading command-line arguments and input files...
> Loading and checking parameter files for compatibility...
> Warning: Problem parsing L-J 6-12 parameters.
> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Preparing trajectories for simulation...
> 20 frames were read in and processed by ptraj for use in calculation.
>
> Beginning GB calculations with sander...
> calculating complex contribution...
> close failed in file object destructor:
> IOError: [Errno 9] Bad file descriptor
>
> * Comment:
> only one IOError left. Hmm.. this time-dependency is strange and could
> be related to our NFS setup? The situation may have improved due to
> files that have already been created during the second run. I don't know.
>

I've never seen anything like this before. If you can't get it working,
perhaps you can send me a couple sample frames and I can try it on a machine
here?

>
> 4) fourth run (directly after the third)
> ========================================
> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>
> Running MMPBSA.MPI on 4 processors...
> Reading command-line arguments and input files...
> Loading and checking parameter files for compatibility...
> Warning: Problem parsing L-J 6-12 parameters.
> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
> Warning: Problem parsing L-J 6-12 parameters.
> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
> Warning: Problem parsing L-J 6-12 parameters.
> Warning: Problem parsing L-J 6-12 parameters.
> Preparing trajectories for simulation...
> 20 frames were read in and processed by ptraj for use in calculation.
>
> Beginning GB calculations with sander...
> calculating complex contribution...
>
> * Comment:
> Now, things looked fine and I left the office
>
>
> 5) next morning
> ===============
>
> Still at
>
> calculating complex contribution...
>
> with python having 100 % CPU usage and no other heavy-cpu-using processes.
>
> _MMPBSA_complex_gb.mdout.0 (attached) was last changed at approximately
> the same time as MMPBSA run 4 started. So I killed the mpirun. The last
> lines of _MMPBSA_complex_gb.mdout.0:
>
>
> > rfree: Error decoding variable 2 2 from:
> >RES EDIT
>

Ahh... this is beginning to make more sense. After the first run, it
couldn't figure out what the RES lines were supposed to be in the
_MMPBSA_*.mdin files (since it didn't know the residue mapping from
receptor/ligand to complex), so it put in placeholder EDIT lines and
requested that you modify those lines yourself. Did you do this?

In `utils.sandercalc`, I printed the sander command:
>
> /apps11/bioinfp/amber10+/bin/sander -O -i _MMPBSA_gb_decomp_com.mdin -o
> _MMPBSA_complex_gb.mdout.0 -p complex_unsolvated.prmtop -c
> _MMPBSA_dummycomplex.inpcrd.1 -y _MMPBSA_complex.mdcrd.0 -r
> _MMPBSA_.restrt.0
>
> I ran it independently and it almost immediately returned, creating the
> same .mdout file as attached. Hence, in case (4) from above,
> MMPBSA.py.MPI had a problem detecting this and ended up in some endless
> loop responsible for 100 % CPU usage.
>

Probably something screwy inside the os.system call. MMPBSA.py doesn't
contain any loops there. Another possibility, which I actually think is
happening, is that not all of the threads were successfully killed. The
process that quits in error *should* call utils.Abort which calls
MPI.COMM_WORLD.Abort(), but I've seen issues here in which the MPI didn't
finish up cleanly after an error (which is either your MPI implementation's
failure to clean up after itself or mpi4py failing to do it).

> Btw: In `utils.sandercalc`, `os.system` is used to run the sander
> process -- which is considered to be a deprecated way. One could think
> about using Python's subprocess module in the future (one advantage is
> being able to receive the stdout/stderr of the subprocess).
>

Agreed. This was one of the first changes to make it into the upcoming
version. All external system calls are done via Popen objects. It also
gets rid of several input files and just pipes that input into the process
via stdin.

>
>
> Summary
> =======
> - Something irritated MMPBSA.py.MPI so it opened/accessed/closed files
> in the wrong order. Difficult to debug due to missing Python tracebacks
> and potential dependency on NFS.
> - Some input makes sander fail, which was not properly handled by
> MMPBSA.py.MPI (endless loop). Also difficult to debug, but you may know
> what's wrong with my input.
> - There is another problem with my input leading to the warning about
> parsing L-J 6-12 parameters.
>
> Btw:
> Why is the ./AmberTools/src/mmpbsa_py/MMPBSA_mods/utils.py the one that
> is relevant during runtime, while changes in
> /lib/python2.6/site-packages/MMPBSA_mods/utils.py do not have an effect?
> Shouldn't it be the other way round?
>

Yes. This will also be changed. This has to do with directory priority in
Python's import statement. The first directory searched is always the
current directory. The next directories searched are those in PYTHONPATH
(constructed identically to PATH and LD_LIBRARY_PATH and the like). Finally
come the stdlib and site-packages (I think some newer Python versions will
actually search a local-user site-packages in ~/.python or something like
that). The issue is, though, that the MMPBSA.py and MMPBSA.py.MPI scripts
themselves were hidden away in $AMBERHOME/AmberTools/src/mmpbsa_py
directories to prevent people from trying to make them executable and run
them directly; forcing them to use the shell scripts as they should.
However, then the "current directory" becomes the same directory that the
original packages are in. However, since it's bad practice to have
important stuff remain in the src directory required for execution I plan on
moving it.

(In case you didn't catch the solution in the above text, you have to change
the EDIT lines in the _MMPBSA_*.mdin files. But note that you will not get
DELTAs unless MMPBSA.py guesses the masks for you).

HTH,
Jason

>
> Hope that someone can help me out here!
>
> Thanks,
>
> Jan-Philip
>
> --
> Jan-Philip Gehrcke
> PhD student
> Structural Bioinformatics Group
>
> Technische Universität Dresden
> Biotechnology Center
> Tatzberg 47/49
> 01307 Dresden, Germany
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Oct 05 2011 - 08:00:05 PDT