Re: [AMBER] MMPBSA.py.MPI Nmode problem

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 26 Jan 2011 09:10:02 -0500

Hello,

This is definitely an MPI problem... How did you recompile in serial? Did
you just "make serial" in the Amber directory to recompile
mmpbsa_py_nabnmode? Try running "make uninstall" in the
$AMBERHOME/AmberTools/src directory and then recompiling AmberTools in
serial only. Then rebuild MMPBSA.py in serial.

If MMPBSA.py.MPI works on this system otherwise (i.e. if not doing nmode
calculations), then I don't know why else this would be happening.

Good luck!
Jason

2011/1/26 Jesper Sørensen <lists.jsx.dk>

> Hi Jason,
>
> I think I might have a related problem. When I try to calculate the entropy
> with nab. It seems like it never starts (given the cpu and wall time being
> 6
> seconds) the minimization and I get several lines of these in the
> output/error from the cluster node:
> [s11n03.grendel.cscaa.dk:02277] [[14503,1],0] routed:binomial: Connection
> to
> lifeline [[14503,0],0] lost
>
> Have you seen anything like this before?
> Of course the calculation fails with "Warning: No snapshots for nmode
> minimized within tolerable limits! Entropy not calculated."
>
> I thought that maybe I had the same problem with mmpbsa_py_nabnmode so I
> recompiled it in serial just to make sure and I still get the error.
> I can run a normal MMPBSA calculation and also one using the entropy from
> ptraj mode, but everytime I use NAB it fails.
>
> Best regards,
> Jesper
>
>
> -----Original Message-----
> From: Jason Swails [mailto:jason.swails.gmail.com]
> Sent: 20. januar 2011 00:21
> To: AMBER Mailing List
> Subject: Re: [AMBER] MMPBSA.py.MPI Nmode problem
>
> Do the tests work in parallel? Specifically the nmode test. To try this,
> set the environment variable DO_PARALLEL="mpiexec -n 2" or however you run
> 2
> MPI threads, then run the nmode test --
>
> cd $AMBERHOME/test/mmpbsa_py
> export DO_PARALLEL='mpiexec -n 2'
> make NAB
>
> And see what happens. This is a test set up for a very small system, so if
> this works, but your system does, it's likely that your system requires too
> much memory to be running 4 simulations on the same board off the same
> shared memory.
>
> Other common problems: compiling nab in parallel (parallel AmberTools)
> before building MMPBSA.py in serial causes mmpbsa_py_nabnmode to be
> compiled
> with MPI support, and its MPI routines clash with the MPI calls in
> MMPBSA.py.MPI, causing errors.
>
> Perhaps the installations of mpi4py and mpich2 were not completed
> successfully (perhaps mpi4py inadvertently used a different MPI
> implementation?).
>
> More tests are needed to narrow down the problem more.
>
> Good luck!
> Jason
>
> On Wed, Jan 19, 2011 at 5:26 PM, filip fratev <filipfratev.yahoo.com>
> wrote:
>
> > Hi all,
> > I installed MMPBSA.py.MPI according to the instructions and it works
> > in a parallel mode with MMPBSA, but when trying to calculate the
> > entropy contribution I always obtain this error (please see below the
> > example with the Amber test provided):
> > I use the last versions of Mpich2 (shared) and mpi4py and python 2.6
> > on Suse 11.3.
> > Thank you in advance for your help!
> >
> > MMPBSA.py.MPI being run on 4 processors ptraj found! Using
> > /home/fratev/amber11/exe/ptraj sander found! Using
> > /home/fratev/amber11/exe/sander nmode program found! Using
> > /home/fratev/amber11/exe/mmpbsa_py_nabnmode
> >
> > Preparing trajectories with ptraj...
> > 10 frames were read in and processed by ptraj for use in calculation.
> > Processing 10 frames with normal mode analysis.
> >
> > Starting calculations
> >
> >
> > Starting nmode calculations...
> > master thread is calculating 3 frames
> >
> > calculating complex for frame number 0
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpYzsbx7
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > receptor for frame number 0
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpZYwgy5
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > ligand for frame number 0
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpxlXSY5
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > complex for frame number 1
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpa24cl6
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > receptor for frame number 1
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpv8T0C6
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > ligand for frame number 1
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpsZhzo6
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > complex for frame number 2
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpIORZa9
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > receptor for frame number 2
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpF8jqx9
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
> > ligand for frame number 2
> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
> > value=/dev/shm/mpich_shar_tmpUuGfV9
> > failed, reason='duplicate_keysharedFilename[0]'
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(388).....:
> > MPID_Init(135)............: channel initialization failed
> > MPIDI_CH3_Init(38)........:
> > MPID_nem_init(196)........:
> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
> > [mpiexec.linux-3ykv] HYDU_sock_read (./utils/sock/sock.c:213): read
> > errno (Input/output error) [mpiexec.linux-3ykv]
> > HYDU_sock_forward_stdio (./utils/sock/sock.c:377):
> > read error
> > [mpiexec.linux-3ykv] HYDT_bscu_stdin_cb
> > (./tools/bootstrap/utils/bscu_cb.c:63): stdin forwarding error
> > [mpiexec.linux-3ykv] HYDT_dmxu_poll_wait_for_event
> > (./tools/demux/demux_poll.c:76): callback returned error status
> > [mpiexec.linux-3ykv] HYDT_bscu_wait_for_completion
> > (./tools/bootstrap/utils/bscu_wait.c:84): error waiting for event
> > [mpiexec.linux-3ykv] HYDT_bsci_wait_for_completion
> > (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned
> > error waiting for completion [mpiexec.linux-3ykv]
> > HYD_pmci_wait_for_completion
> > (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
> > waiting for completion [mpiexec.linux-3ykv] main
> > (./ui/mpich/mpiexec.c:302): process manager error waiting for
> > completion
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Graduate Student
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 26 2011 - 06:30:09 PST
Custom Search