Re: [AMBER] MMPBSA.py.MPI problems: L-J-parameters, bad file descriptors, MMPBSA hangup due to sander failure from Jan-Philip Gehrcke on 2011-10-05 (Amber Archive Oct 2011)

From: Jan-Philip Gehrcke <jgehrcke.googlemail.com>
Date: Wed, 05 Oct 2011 18:16:45 +0200

On 10/05/2011 04:40 PM, Jason Swails wrote:

>> What does the warning regarding the Lennard Jones parameters mean?
>>
>
> It means that there was some problem figuring out what the Lennard Jones
> parameters were (it had problems parsing the LENNARD_JONES_A/BCOEF arrays).
> That except is catching Exception, which may be masking something. Go to
> $AMBERHOME/AmberTools/src/etc/chemistry/amber/readparm.py and drop the
> except clause to see the traceback.

In this case, I had to modify the file in the lib directory :-)
($AMBERHOME/lib/python2.6/chemistry/amber/readparm.py)

Around line 212 it now looks like:

   print "fill_LJ()"
   #try:
   self.fill_LJ() # fill LJ arrays with LJ data for easy manipulations
   #except:
   # print >> stderr, 'Warning: Problem parsing L-J 6-12 parameters.'

MMPBSA invocation including traceback:

MMPBSA -i mmpbsa_decomp.in -cp complex_unsolvated.prmtop -sp
../topology.top -rp receptor.prmtop -lp ligand.prmtop -y
../md_equilibrate_00*
Reading command-line arguments and input files...
Loading and checking parameter files for compatibility...
fill_LJ()
fill_LJ()
fill_LJ()
fill_LJ()
Traceback (most recent call last):
  File
"/home/bioinfp/jang/apps/amber11/AmberTools/src/mmpbsa_py/MMPBSA.py",
line 423, in <module>
    lig_prm = amberParm(FILES['ligand_prmtop'])
  File
"/home/bioinfp/jang/apps/amber11/lib/python2.6/chemistry/amber/readparm.py",
line 212, in __init__
    self.fill_LJ() # fill LJ arrays with LJ data for easy manipulations
  File
"/home/bioinfp/jang/apps/amber11/lib/python2.6/chemistry/amber/readparm.py",
line 765, in fill_LJ
    self.LJ_types[self.parm_data["AMBER_ATOM_TYPE"][i]] =
self.parm_data["ATOM_TYPE_INDEX"][i]
IndexError: list index out of range

> This worries me though and makes me
> think something may be wrong with your topology files. Can you send me them

Thanks. I will do this in a minute off-list.

>
>
> Note that the new version will know how to map receptor and ligand residues
> based on the receptor_mask and ligand_mask that you provide, so this step
> will become unnecessary. (If your system is a protein-protein complex, try
> switching the definition of your receptor and ligand and MMPBSA.py will be
> able to figure out the masks for you).

I don't quite get why exchanging definitions makes a difference, but I
will try.

>
>
>>
>> 2) second run (with -use-mdins)
>> ===============================
>>
>> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
>> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
>> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>>
>> Running MMPBSA.MPI on 4 processors...
>> Reading command-line arguments and input files...
>> Loading and checking parameter files for compatibility...
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
>> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
>> Preparing trajectories for simulation...
>> 20 frames were read in and processed by ptraj for use in calculation.
>>
>> Beginning GB calculations with sander...
>> calculating complex contribution...
>> close failed in file object destructor:
>> IOError: [Errno 9] Bad file descriptor
>> close failed in file object destructor:
>> IOError: [Errno 9] Bad file descriptor
>> close failed in file object destructor:
>> IOError: [Errno 9] Bad file descriptor
>>
>>
>> * Comment:
>> 3 Python IOErrors. But, unfortunately, without the full Python
>> Traceback. Why is it not printed here? Maybe due to a too catchy
>> try/except block? Close attempts fail three times, because the file
>> descriptors are already invalid. Looks one of the four MPI processes
>> wins, closes the file(s) and the other three fail...
>>
>
> No, this version of MMPBSA actually doesn't make much use of try-except
> blocks (it works with return codes instead). The problem is that all of
> this occurs inside an os.system call (as you point out below), so any
> problem that crops up inside the spawned process simply returns a non-zero
> value to the os.system function call. It's os.system that's masking issues
> here, not over-generalized except clauses.

I am not sure what has to happen to a child process so that the IOError
regarding the bad file descriptor is raised during os.system(). This
probably has something to do with broken stdout/err pipes. Looks like in
case of error, sander does not exit properly or somehow "unexpectedly".
But without knowing exactly what leads to these IOErrors, it's difficult
to find the origin.

>
>
>>
>> 3) third run (directly after the second, with the same arguments)
>> =================================================================
>>
>> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
>> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
>> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>>
>> Running MMPBSA.MPI on 4 processors...
>> Reading command-line arguments and input files...
>> Loading and checking parameter files for compatibility...
>> Warning: Problem parsing L-J 6-12 parameters.
>> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
>> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> Preparing trajectories for simulation...
>> 20 frames were read in and processed by ptraj for use in calculation.
>>
>> Beginning GB calculations with sander...
>> calculating complex contribution...
>> close failed in file object destructor:
>> IOError: [Errno 9] Bad file descriptor
>>
>> * Comment:
>> only one IOError left. Hmm.. this time-dependency is strange and could
>> be related to our NFS setup? The situation may have improved due to
>> files that have already been created during the second run. I don't know.
>>
>
> I've never seen anything like this before. If you can't get it working,
> perhaps you can send me a couple sample frames and I can try it on a machine
> here?
>

Sure, thanks.

If I get it working by using correct input files/parameters, it's cool,
but we did not win anything regarding the error handling, which in this
case obviously is not optimal. On the other hand, if there already is a
re-write of MMPBSA.py that's almost done for the next AmberTools
release, it's probably not worth it investing the time..

>
>>
>> 4) fourth run (directly after the third)
>> ========================================
>> $ mpirun -np 4 MMPBSA.MPI -i mmpbsa_decomp.in -cp
>> complex_unsolvated.prmtop -sp ../topology.top -rp receptor.prmtop -lp
>> ligand.prmtop -use-mdins -y ../md_equilibrate_00*
>>
>> Running MMPBSA.MPI on 4 processors...
>> Reading command-line arguments and input files...
>> Loading and checking parameter files for compatibility...
>> Warning: Problem parsing L-J 6-12 parameters.
>> ptraj found! Using /home/bioinfp/jang/apps/amber11/bin/ptraj
>> Warning: Problem parsing L-J 6-12 parameters.
>> sander found! Using /apps11/bioinfp/amber10+/bin/sander for GB calculations
>> Warning: Problem parsing L-J 6-12 parameters.
>> Warning: Problem parsing L-J 6-12 parameters.
>> Preparing trajectories for simulation...
>> 20 frames were read in and processed by ptraj for use in calculation.
>>
>> Beginning GB calculations with sander...
>> calculating complex contribution...
>>
>> * Comment:
>> Now, things looked fine and I left the office
>>
>>
>> 5) next morning
>> ===============
>>
>> Still at
>>
>> calculating complex contribution...
>>
>> with python having 100 % CPU usage and no other heavy-cpu-using processes.
>>
>> _MMPBSA_complex_gb.mdout.0 (attached) was last changed at approximately
>> the same time as MMPBSA run 4 started. So I killed the mpirun. The last
>> lines of _MMPBSA_complex_gb.mdout.0:
>>
>>
>>> rfree: Error decoding variable 2 2 from:
>>> RES EDIT
>>
>
> Ahh... this is beginning to make more sense. After the first run, it
> couldn't figure out what the RES lines were supposed to be in the
> _MMPBSA_*.mdin files (since it didn't know the residue mapping from
> receptor/ligand to complex), so it put in placeholder EDIT lines and
> requested that you modify those lines yourself. Did you do this?

Not initially, but after a while I did this, yes :-) Now, I've done it
for three .mdin files, so MMPBSA.MPI proceeds further than before:

[...]
  calculating receptor contribution...
  calculating ligand contribution...
bad atom type:
utils.Abort!
bad atom type:
close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor
bad atom type:
close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor

I think, we can just accept that my input files are somehow screwed up
and hope that the error handling in the newer MMPBSA.py version will be
better.

>
> In `utils.sandercalc`, I printed the sander command:
>>
>> /apps11/bioinfp/amber10+/bin/sander -O -i _MMPBSA_gb_decomp_com.mdin -o
>> _MMPBSA_complex_gb.mdout.0 -p complex_unsolvated.prmtop -c
>> _MMPBSA_dummycomplex.inpcrd.1 -y _MMPBSA_complex.mdcrd.0 -r
>> _MMPBSA_.restrt.0
>>
>> I ran it independently and it almost immediately returned, creating the
>> same .mdout file as attached. Hence, in case (4) from above,
>> MMPBSA.py.MPI had a problem detecting this and ended up in some endless
>> loop responsible for 100 % CPU usage.
>>
>
> Probably something screwy inside the os.system call. MMPBSA.py doesn't
> contain any loops there. Another possibility, which I actually think is
> happening, is that not all of the threads were successfully killed. The
> process that quits in error *should* call utils.Abort which calls
> MPI.COMM_WORLD.Abort(), but I've seen issues here in which the MPI didn't
> finish up cleanly after an error (which is either your MPI implementation's
> failure to clean up after itself or mpi4py failing to do it).

This definitely could have happened and it's difficult for me to
reproduce it in a controlled fashion, although I've already seen it (100
% CPU for python and no sander) three times during playing around in the
last hours. MPI/NFS introduce a lot of degrees of freedom and solving
this mystery might get pretty complex.

>
>
>> Btw: In `utils.sandercalc`, `os.system` is used to run the sander
>> process -- which is considered to be a deprecated way. One could think
>> about using Python's subprocess module in the future (one advantage is
>> being able to receive the stdout/stderr of the subprocess).
>>
>
> Agreed. This was one of the first changes to make it into the upcoming
> version. All external system calls are done via Popen objects. It also
> gets rid of several input files and just pipes that input into the process
> via stdin.

Good decision :)

>
>
>>
>>
>> Summary
>> =======
>> - Something irritated MMPBSA.py.MPI so it opened/accessed/closed files
>> in the wrong order. Difficult to debug due to missing Python tracebacks
>> and potential dependency on NFS.
>> - Some input makes sander fail, which was not properly handled by
>> MMPBSA.py.MPI (endless loop). Also difficult to debug, but you may know
>> what's wrong with my input.
>> - There is another problem with my input leading to the warning about
>> parsing L-J 6-12 parameters.
>>
>> Btw:
>> Why is the ./AmberTools/src/mmpbsa_py/MMPBSA_mods/utils.py the one that
>> is relevant during runtime, while changes in
>> /lib/python2.6/site-packages/MMPBSA_mods/utils.py do not have an effect?
>> Shouldn't it be the other way round?
>>
>
> Yes. This will also be changed. This has to do with directory priority in
> Python's import statement. The first directory searched is always the
> current directory. The next directories searched are those in PYTHONPATH
> (constructed identically to PATH and LD_LIBRARY_PATH and the like). Finally
> come the stdlib and site-packages (I think some newer Python versions will
> actually search a local-user site-packages in ~/.python or something like
> that). The issue is, though, that the MMPBSA.py and MMPBSA.py.MPI scripts
> themselves were hidden away in $AMBERHOME/AmberTools/src/mmpbsa_py
> directories to prevent people from trying to make them executable and run
> them directly; forcing them to use the shell scripts as they should.
> However, then the "current directory" becomes the same directory that the
> original packages are in. However, since it's bad practice to have
> important stuff remain in the src directory required for execution I plan on
> moving it.
>
> (In case you didn't catch the solution in the above text, you have to change
> the EDIT lines in the _MMPBSA_*.mdin files. But note that you will not get
> DELTAs unless MMPBSA.py guesses the masks for you).
>
> HTH,
> Jason
>
>
>>
>> Hope that someone can help me out here!
>>
>> Thanks,
>>
>> Jan-Philip
>>
>> --
>> Jan-Philip Gehrcke
>> PhD student
>> Structural Bioinformatics Group
>>
>> Technische Universität Dresden
>> Biotechnology Center
>> Tatzberg 47/49
>> 01307 Dresden, Germany
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Oct 05 2011 - 09:30:03 PDT