Do the tests work in parallel? Specifically the nmode test. To try this,
set the environment variable DO_PARALLEL="mpiexec -n 2" or however you run 2
MPI threads, then run the nmode test --
cd $AMBERHOME/test/mmpbsa_py
export DO_PARALLEL='mpiexec -n 2'
make NAB
And see what happens. This is a test set up for a very small system, so if
this works, but your system does, it's likely that your system requires too
much memory to be running 4 simulations on the same board off the same
shared memory.
Other common problems: compiling nab in parallel (parallel AmberTools)
before building MMPBSA.py in serial causes mmpbsa_py_nabnmode to be compiled
with MPI support, and its MPI routines clash with the MPI calls in
MMPBSA.py.MPI, causing errors.
Perhaps the installations of mpi4py and mpich2 were not completed
successfully (perhaps mpi4py inadvertently used a different MPI
implementation?).
More tests are needed to narrow down the problem more.
Good luck!
Jason
On Wed, Jan 19, 2011 at 5:26 PM, filip fratev <filipfratev.yahoo.com> wrote:
> Hi all,
> I installed MMPBSA.py.MPI according to the instructions and it works in a
> parallel mode with MMPBSA, but when trying to calculate the entropy
> contribution I always obtain this error (please see below the example with
> the Amber test provided):
> I use the last versions of Mpich2 (shared) and mpi4py and python 2.6 on
> Suse 11.3.
> Thank you in advance for your help!
>
> MMPBSA.py.MPI being run on 4 processors
> ptraj found! Using /home/fratev/amber11/exe/ptraj
> sander found! Using /home/fratev/amber11/exe/sander
> nmode program found! Using /home/fratev/amber11/exe/mmpbsa_py_nabnmode
>
> Preparing trajectories with ptraj...
> 10 frames were read in and processed by ptraj for use in calculation.
> Processing 10 frames with normal mode analysis.
>
> Starting calculations
>
>
> Starting nmode calculations...
> master thread is calculating 3 frames
>
> calculating complex for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpYzsbx7
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating receptor for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpZYwgy5
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating ligand for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpxlXSY5
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating complex for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpa24cl6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating receptor for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpv8T0C6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating ligand for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpsZhzo6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating complex for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpIORZa9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating receptor for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpF8jqx9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> calculating ligand for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpUuGfV9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> [mpiexec.linux-3ykv] HYDU_sock_read (./utils/sock/sock.c:213): read errno
> (Input/output error)
> [mpiexec.linux-3ykv] HYDU_sock_forward_stdio (./utils/sock/sock.c:377):
> read error
> [mpiexec.linux-3ykv] HYDT_bscu_stdin_cb
> (./tools/bootstrap/utils/bscu_cb.c:63): stdin forwarding error
> [mpiexec.linux-3ykv] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:76): callback returned error status
> [mpiexec.linux-3ykv] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:84): error waiting for event
> [mpiexec.linux-3ykv] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
> waiting for completion
> [mpiexec.linux-3ykv] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error waiting
> for completion
> [mpiexec.linux-3ykv] main (./ui/mpich/mpiexec.c:302): process manager
> error waiting for completion
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 19 2011 - 15:30:06 PST