Re: [AMBER] MMPBSA.py.MPI Nmode problem

From: Jesper Sørensen <lists.jsx.dk>
Date: Wed, 26 Jan 2011 19:39:17 +0100

Hi Jason,

That solved the problem. I made a fresh installation where I didn't make
the parallel part of AmberTools and that solved the problem, so it must
have been a conflict with mmpbsa_py_nabnmode.

Thanks for the helpful suggestions...
Jesper

On 26/01/11 15.10, "Jason Swails" <jason.swails.gmail.com> wrote:

>Hello,
>
>This is definitely an MPI problem... How did you recompile in serial?
>Did
>you just "make serial" in the Amber directory to recompile
>mmpbsa_py_nabnmode? Try running "make uninstall" in the
>$AMBERHOME/AmberTools/src directory and then recompiling AmberTools in
>serial only. Then rebuild MMPBSA.py in serial.
>
>If MMPBSA.py.MPI works on this system otherwise (i.e. if not doing nmode
>calculations), then I don't know why else this would be happening.
>
>Good luck!
>Jason
>
>2011/1/26 Jesper Sørensen <lists.jsx.dk>
>
>> Hi Jason,
>>
>> I think I might have a related problem. When I try to calculate the
>>entropy
>> with nab. It seems like it never starts (given the cpu and wall time
>>being
>> 6
>> seconds) the minimization and I get several lines of these in the
>> output/error from the cluster node:
>> [s11n03.grendel.cscaa.dk:02277] [[14503,1],0] routed:binomial:
>>Connection
>> to
>> lifeline [[14503,0],0] lost
>>
>> Have you seen anything like this before?
>> Of course the calculation fails with "Warning: No snapshots for nmode
>> minimized within tolerable limits! Entropy not calculated."
>>
>> I thought that maybe I had the same problem with mmpbsa_py_nabnmode so I
>> recompiled it in serial just to make sure and I still get the error.
>> I can run a normal MMPBSA calculation and also one using the entropy
>>from
>> ptraj mode, but everytime I use NAB it fails.
>>
>> Best regards,
>> Jesper
>>
>>
>> -----Original Message-----
>> From: Jason Swails [mailto:jason.swails.gmail.com]
>> Sent: 20. januar 2011 00:21
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] MMPBSA.py.MPI Nmode problem
>>
>> Do the tests work in parallel? Specifically the nmode test. To try
>>this,
>> set the environment variable DO_PARALLEL="mpiexec -n 2" or however you
>>run
>> 2
>> MPI threads, then run the nmode test --
>>
>> cd $AMBERHOME/test/mmpbsa_py
>> export DO_PARALLEL='mpiexec -n 2'
>> make NAB
>>
>> And see what happens. This is a test set up for a very small system,
>>so if
>> this works, but your system does, it's likely that your system requires
>>too
>> much memory to be running 4 simulations on the same board off the same
>> shared memory.
>>
>> Other common problems: compiling nab in parallel (parallel AmberTools)
>> before building MMPBSA.py in serial causes mmpbsa_py_nabnmode to be
>> compiled
>> with MPI support, and its MPI routines clash with the MPI calls in
>> MMPBSA.py.MPI, causing errors.
>>
>> Perhaps the installations of mpi4py and mpich2 were not completed
>> successfully (perhaps mpi4py inadvertently used a different MPI
>> implementation?).
>>
>> More tests are needed to narrow down the problem more.
>>
>> Good luck!
>> Jason
>>
>> On Wed, Jan 19, 2011 at 5:26 PM, filip fratev <filipfratev.yahoo.com>
>> wrote:
>>
>> > Hi all,
>> > I installed MMPBSA.py.MPI according to the instructions and it works
>> > in a parallel mode with MMPBSA, but when trying to calculate the
>> > entropy contribution I always obtain this error (please see below the
>> > example with the Amber test provided):
>> > I use the last versions of Mpich2 (shared) and mpi4py and python 2.6
>> > on Suse 11.3.
>> > Thank you in advance for your help!
>> >
>> > MMPBSA.py.MPI being run on 4 processors ptraj found! Using
>> > /home/fratev/amber11/exe/ptraj sander found! Using
>> > /home/fratev/amber11/exe/sander nmode program found! Using
>> > /home/fratev/amber11/exe/mmpbsa_py_nabnmode
>> >
>> > Preparing trajectories with ptraj...
>> > 10 frames were read in and processed by ptraj for use in calculation.
>> > Processing 10 frames with normal mode analysis.
>> >
>> > Starting calculations
>> >
>> >
>> > Starting nmode calculations...
>> > master thread is calculating 3 frames
>> >
>> > calculating complex for frame number 0
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpYzsbx7
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > receptor for frame number 0
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpZYwgy5
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > ligand for frame number 0
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpxlXSY5
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > complex for frame number 1
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpa24cl6
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > receptor for frame number 1
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpv8T0C6
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > ligand for frame number 1
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpsZhzo6
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > complex for frame number 2
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpIORZa9
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > receptor for frame number 2
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpF8jqx9
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1 calculating
>> > ligand for frame number 2
>> > [cli_0]: Command cmd=put kvsname=kvs_25724_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpUuGfV9
>> > failed, reason='duplicate_keysharedFilename[0]'
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(388).....:
>> > MPID_Init(135)............: channel initialization failed
>> > MPIDI_CH3_Init(38)........:
>> > MPID_nem_init(196)........:
>> > MPIDI_CH3I_Seg_commit(337): PMI_KVS_Put returned -1
>> > [mpiexec.linux-3ykv] HYDU_sock_read (./utils/sock/sock.c:213): read
>> > errno (Input/output error) [mpiexec.linux-3ykv]
>> > HYDU_sock_forward_stdio (./utils/sock/sock.c:377):
>> > read error
>> > [mpiexec.linux-3ykv] HYDT_bscu_stdin_cb
>> > (./tools/bootstrap/utils/bscu_cb.c:63): stdin forwarding error
>> > [mpiexec.linux-3ykv] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:76): callback returned error status
>> > [mpiexec.linux-3ykv] HYDT_bscu_wait_for_completion
>> > (./tools/bootstrap/utils/bscu_wait.c:84): error waiting for event
>> > [mpiexec.linux-3ykv] HYDT_bsci_wait_for_completion
>> > (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned
>> > error waiting for completion [mpiexec.linux-3ykv]
>> > HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
>> > waiting for completion [mpiexec.linux-3ykv] main
>> > (./ui/mpich/mpiexec.c:302): process manager error waiting for
>> > completion
>> >
>> >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>>
>>
>> --
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Graduate Student
>> 352-392-4032
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
>--
>Jason M. Swails
>Quantum Theory Project,
>University of Florida
>Ph.D. Graduate Student
>352-392-4032
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 26 2011 - 11:00:05 PST
Custom Search