Re: [AMBER] MMPBSA.py.MPI Nmode problem

From: Jesper Sørensen <lists.jsx.dk>
Date: Wed, 26 Jan 2011 14:14:42 +0100

Hi Jason,

I think I might have a related problem. When I try to calculate the entropy
with nab. It seems like it never starts (given the cpu and wall time being 6
seconds) the minimization and I get several lines of these in the
output/error from the cluster node:
[s11n03.grendel.cscaa.dk:02277] [[14503,1],0] routed:binomial: Connection to
lifeline [[14503,0],0] lost

Have you seen anything like this before?
Of course the calculation fails with "Warning: No snapshots for nmode
minimized within tolerable limits! Entropy not calculated."

I thought that maybe I had the same problem with mmpbsa_py_nabnmode so I
recompiled it in serial just to make sure and I still get the error.
I can run a normal MMPBSA calculation and also one using the entropy from
ptraj mode, but everytime I use NAB it fails.

Best regards,
Jesper


-----Original Message-----
From: Jason Swails [mailto:jason.swails.gmail.com]
Sent: 20. januar 2011 00:21
To: AMBER Mailing List
Subject: Re: [AMBER] MMPBSA.py.MPI Nmode problem

Do the tests work in parallel? Specifically the nmode test. To try this,
set the environment variable DO_PARALLEL="mpiexec -n 2" or however you run 2
MPI threads, then run the nmode test --

cd $AMBERHOME/test/mmpbsa_py
export DO_PARALLEL='mpiexec -n 2'
make NAB

And see what happens. This is a test set up for a very small system, so if
this works, but your system does, it's likely that your system requires too
much memory to be running 4 simulations on the same board off the same
shared memory.

Other common problems: compiling nab in parallel (parallel AmberTools)
before building MMPBSA.py in serial causes mmpbsa_py_nabnmode to be compiled
with MPI support, and its MPI routines clash with the MPI calls in
MMPBSA.py.MPI, causing errors.

Perhaps the installations of mpi4py and mpich2 were not completed
successfully (perhaps mpi4py inadvertently used a different MPI
implementation?).

More tests are needed to narrow down the problem more.

Good luck!
Jason

On Wed, Jan 19, 2011 at 5:26 PM, filip fratev <filipfratev.yahoo.com> wrote:

> Hi all,
> I installed MMPBSA.py.MPI according to the instructions and it works
> in a parallel mode with MMPBSA, but when trying to calculate the
> entropy contribution I always obtain this error (please see below the
> example with the Amber test provided):
> I use the last versions of Mpich2 (shared) and mpi4py and python 2.6
> on Suse 11.3.
> Thank you in advance for your help!
>
> MMPBSA.py.MPI being run on 4 processors ptraj found! Using
> /home/fratev/amber11/exe/ptraj sander found! Using
> /home/fratev/amber11/exe/sander nmode program found! Using
> /home/fratev/amber11/exe/mmpbsa_py_nabnmode
>
> Preparing trajectories with ptraj...
> 10 frames were read in and processed by ptraj for use in calculation.
> Processing 10 frames with normal mode analysis.
>
> Starting calculations
>
>
> Starting nmode calculations...
> master thread is calculating 3 frames
>
> calculating complex for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpYzsbx7
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> receptor for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpZYwgy5
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> ligand for frame number 0
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpxlXSY5
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> complex for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpa24cl6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> receptor for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpv8T0C6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> ligand for frame number 1
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpsZhzo6
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> complex for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpIORZa9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> receptor for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpF8jqx9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> ligand for frame number 2
> [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> value=/dev/shm/mpich_shar_tmpUuGfV9
> failed, reason='duplicate_keysharedFilename[0]'
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(388).....:
> MPID_Init(135)............: channel initialization failed
> MPIDI_CH3_Init(38)........:
> MPID_nem_init(196)........:
> MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> [mpiexec.linux-3ykv] HYDU_sock_read (./utils/sock/sock.c:213): read
> errno (Input/output error) [mpiexec.linux-3ykv]
> HYDU_sock_forward_stdio (./utils/sock/sock.c:377):
> read error
> [mpiexec.linux-3ykv] HYDT_bscu_stdin_cb
> (./tools/bootstrap/utils/bscu_cb.c:63): stdin forwarding error
> [mpiexec.linux-3ykv] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:76): callback returned error status
> [mpiexec.linux-3ykv] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:84): error waiting for event
> [mpiexec.linux-3ykv] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned
> error waiting for completion [mpiexec.linux-3ykv]
> HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
> waiting for completion [mpiexec.linux-3ykv] main
> (./ui/mpich/mpiexec.c:302): process manager error waiting for
> completion
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>


--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 26 2011 - 05:30:07 PST
Custom Search