Re: [AMBER] MMPBSA.py.MPI error

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 11 Feb 2022 14:06:38 -0500

This looks like a set of race conditions arising from every thread being
assigned rank 0, and therefore each of the threads trying to do exactly the
same thing writing to exactly the same files and clobbering one another in
the attempt.

When you launch a program via `mpirun`, it will launch as many independent
processes as were requested and assign each of them a unique rank in a
global "world" communicator. The fact that each thread gets the same rank
(0) indicates that `mpirun` has failed to function as it should have (only
processes with global rank 0 ever print information to stdout).

There are two ways I've seen this problem happen in the past:

* The MPI installation is broken in some way
* The MPI compilers you used to build Amber and the "mpirun" or "mpiexec"
wrapper you used to run the programs come from two different MPI
distributions.

The second is more common and is the likely culprit here. The simplest way
forward is to make sure and rebuild mpi4py inside
AmberTools/src/mpi4py-3.0.3/ (via amber.python setup.py install), making
sure to use the same MPI compiler wrappers as come with the mpiexec/mpirun
binary you're using.

HTH,
Jason

On Thu, Feb 3, 2022 at 3:35 AM Fabian Glaser <fabian.glaser.gmail.com>
wrote:

> Dear experts,
>
> I am trying to run MMPBSA.py.MPI and I get the following error, which does
> not appear with MMPBSA.py. I saw there are a few threads talking about it,
> but I did not find a solution, maybe I missed it.
>
>
> Any help will be appreciated,
>
> Best regards,
> Fabian
>
>
>
>
> >./mmpbsa.sh
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> Loading and checking parameter files for compatibility...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> cpptraj found! Using /home/amber20/amber20/bin/cpptraj
> sander found! Using /home/amber20/amber20/bin/sander
> Preparing trajectories for simulation...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> At line 174 of file
> /home/amber20/amber20_src/AmberTools/src/sander/getcor.F90 (unit = 9, file
> = '_MMPBSA_dummycomplex.inpcrd')
> Fortran runtime error: End of file
>
> Error termination. Backtrace:
> #0 0x7fd37d1e532a
> #1 0x7fd37d1e5ed5
> #2 0x7fd37d1e669d
> #3 0x7fd37d35cca3
> #4 0x7fd37d35d28a
> #5 0x7fd37d359e8f
> #6 0x7fd37d35e79c
> #7 0x7fd37d35f72c
> #8 0x55c40254b823
> #9 0x55c4024f0e78
> #10 0x55c4024ee6b2
> #11 0x55c4024ee70e
> #12 0x7fd37cbe2bf6
> #13 0x55c4023466b9
> #14 0xffffffffffffffff
> At line 116 of file
> /home/amber20/amber20_src/AmberTools/src/sander/trajene.F90 (unit = 24,
> file = '_MMPBSA_complex.mdcrd.0')
> Fortran runtime error: End of file
>
> Error termination. Backtrace:
> File "/home/amber20/amber20/bin/MMPBSA.py.MPI", line 100, in <module>
> app.run_mmpbsa()
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/main.py",
> line 218, in run_mmpbsa
> self.calc_list.run(rank, self.stdout)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 82, in run
> calc.run(rank, stdout=stdout, stderr=stderr)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 428, in run
> error_list = [s.strip() for s in out.split('\n')
> TypeError: a bytes-like object is required, not 'str'
>
> Fatal Error!
> All files have been retained for your error investigation:
> You should begin by examining the output files of the first failed
> calculation.
> Consult the "Temporary Files" subsection of the MMPBSA.py chapter in the
> manual for file naming conventions.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> #0 0x7efdac19d32a
> #1 0x7efdac19ded5
> #2 0x7efdac19e69d
> #3 0x7efdac314ca3
> #4 0x7efdac31528a
> #5 0x7efdac311e8f
> #6 0x7efdac31679c
> #7 0x7efdac31772c
> #8 0x55bddaa632c6
> #9 0x55bddaa61d3e
> #10 0x55bddaa5a6b2
> #11 0x55bddaa5a70e
> #12 0x7efdabb9abf6
> #13 0x55bdda8b26b9
> #14 0xffffffffffffffff
> File "/home/amber20/amber20/bin/MMPBSA.py.MPI", line 100, in <module>
> app.run_mmpbsa()
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/main.py",
> line 218, in run_mmpbsa
> self.calc_list.run(rank, self.stdout)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 82, in run
> calc.run(rank, stdout=stdout, stderr=stderr)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 428, in run
> error_list = [s.strip() for s in out.split('\n')
> TypeError: a bytes-like object is required, not 'str'
>
> Fatal Error!
> All files have been retained for your error investigation:
> You should begin by examining the output files of the first failed
> calculation.
> Consult the "Temporary Files" subsection of the MMPBSA.py chapter in the
> manual for file naming conventions.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> At line 174 of file
> /home/amber20/amber20_src/AmberTools/src/sander/getcor.F90 (unit = 9, file
> = '_MMPBSA_dummycomplex.inpcrd')
> Fortran runtime error: End of file
>
> Error termination. Backtrace:
> #0 0x7f1992dc232a
> #1 0x7f1992dc2ed5
> #2 0x7f1992dc369d
> #3 0x7f1992f39ca3
> #4 0x7f1992f3a28a
> #5 0x7f1992f36e8f
> #6 0x7f1992f3b79c
> #7 0x7f1992f3c72c
> #8 0x5608c51af823
> #9 0x5608c5154e78
> #10 0x5608c51526b2
> #11 0x5608c515270e
> #12 0x7f19927bfbf6
> #13 0x5608c4faa6b9
> #14 0xffffffffffffffff
> File "/home/amber20/amber20/bin/MMPBSA.py.MPI", line 100, in <module>
> app.run_mmpbsa()
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/main.py",
> line 218, in run_mmpbsa
> self.calc_list.run(rank, self.stdout)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 82, in run
> calc.run(rank, stdout=stdout, stderr=stderr)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 428, in run
> error_list = [s.strip() for s in out.split('\n')
> TypeError: a bytes-like object is required, not 'str'
>
> Fatal Error!
> All files have been retained for your error investigation:
> You should begin by examining the output files of the first failed
> calculation.
> Consult the "Temporary Files" subsection of the MMPBSA.py chapter in the
> manual for file naming conventions.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> File "/home/amber20/amber20/bin/MMPBSA.py.MPI", line 100, in <module>
> app.run_mmpbsa()
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/main.py",
> line 218, in run_mmpbsa
> self.calc_list.run(rank, self.stdout)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 82, in run
> calc.run(rank, stdout=stdout, stderr=stderr)
> File
> "/home/amber20/amber20/lib/python3.9/site-packages/MMPBSA_mods/calculation.py",
> line 428, in run
> error_list = [s.strip() for s in out.split('\n')
> TypeError: a bytes-like object is required, not 'str'
>
> Fatal Error!
> All files have been retained for your error investigation:
> You should begin by examining the output files of the first failed
> calculation.
> Consult the "Temporary Files" subsection of the MMPBSA.py chapter in the
> manual for file naming conventions.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> 11 frames were processed by cpptraj for use in calculation.
>
> Running calculations on normal system...
>
> Beginning PB calculations with /home/amber20/amber20/bin/sander
> calculating complex contribution...
> ^C[mpiexec.amber2] Sending Ctrl-C to processes as requested
> [mpiexec.amber2] Press Ctrl-C again to force abort
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
> MMPBSA.py.MPI interrupted! Program terminated. All files are kept.
>
>
>
>
> Fabian Glaser
>
> Bioinformatics Knowledge Unit - BKU
> The Lorry I. Lokey Center for Life Sciences and Engineering
> Technion - Israel Institute of Technology, Haifa, Israel
>
> Tel +972 (0) 4 8293701
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>


-- 
Jason M. Swails
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Feb 11 2022 - 11:30:03 PST
Custom Search