RE: [AMBER] MPI process terminated unexpectedly forrtl: error (69): process interrupted (SIGINT)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 8 Jul 2009 16:01:24 +0100

Hi Andrew,

" PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
errno=104) . file pmgr_collective_mpispawn.c:121"

This likely means a cabling problem or a failing infiniband? card. I would
suggest running some of the MVAPICH or similar tests to verify everything.

All the best
Ross

> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> Behalf Of Andrew Voronkov
> Sent: Wednesday, July 08, 2009 2:39 AM
> To: AMBER Mailing List
> Subject: [AMBER] MPI process terminated unexpectedly forrtl: error
> (69): process interrupted (SIGINT)
>
> Dear Amber users,
> I've got error while running the sander of Amber 10 at cluster. A day
> before with the same flags and input files everything was working ok.
> So it seems mostly to be the problem on cluster side, but what is the
> reason for it?
>
> Best regards,
> Andrew
>
>
> MPI process terminated unexpectedly
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002B5E220E250E Unknown Unknown
> Unknown
> libc.so.6 00002B5E2209A913 Unknown Unknown
> Unknown
> libc.so.6 00002B5E2209AA6A Unknown Unknown
> Unknown
> libc.so.6 00002B5E2208FBD4 Unknown Unknown
> Unknown
> libnss_files.so.2 00002B5E2278623A Unknown Unknown
> Unknown
> libnss_files.so.2 00002B5E22786D4B Unknown Unknown
> Unknown
> libc.so.6 00002B5E22103581 Unknown Unknown
> Unknown
> libc.so.6 00002B5E22102EB3 Unknown Unknown
> Unknown
> sander.MPI 00000000008FE32D Unknown Unknown
> Unknown
> sander.MPI 00000000008DDE31 Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002B5E22050CF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002B05378AC50E Unknown Unknown
> Unknown
> libc.so.6 00002B0537864913 Unknown Unknown
> Unknown
> libc.so.6 00002B0537864A6A Unknown Unknown
> Unknown
> libc.so.6 00002B0537859BD4 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0536F6D98A Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0536F6D06B Unknown Unknown
> Unknown
> sander.MPI 00000000008DC28C Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002B053781ACF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002B0E516EB50E Unknown Unknown
> Unknown
> libc.so.6 00002B0E516C7557 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0E50DAC800 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0E50DAC06B Unknown Unknown
> Unknown
> sander.MPI 00000000008DC28C Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002B0E51659CF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002AABBF7DD50E Unknown Unknown
> Unknown
> libc.so.6 00002AABBF795913 Unknown Unknown
> Unknown
> libc.so.6 00002AABBF795A6A Unknown Unknown
> Unknown
> libc.so.6 00002AABBF78ABD4 Unknown Unknown
> Unknown
> libnss_files.so.2 00002AABBFE8123A Unknown Unknown
> Unknown
> libnss_files.so.2 00002AABBFE81D4B Unknown Unknown
> Unknown
> libc.so.6 00002AABBF7FE581 Unknown Unknown
> Unknown
> libc.so.6 00002AABBF7FDEB3 Unknown Unknown
> Unknown
> sander.MPI 00000000008FE32D Unknown Unknown
> Unknown
> sander.MPI 00000000008DDE31 Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002AABBF74BCF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002B4C71DAE50E Unknown Unknown
> Unknown
> libc.so.6 00002B4C71D66913 Unknown Unknown
> Unknown
> libc.so.6 00002B4C71D66A6A Unknown Unknown
> Unknown
> libc.so.6 00002B4C71D5BBD4 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B4C7146F98A Unknown Unknown
> Unknown
> libibverbs.so.1 00002B4C7146F06B Unknown Unknown
> Unknown
> sander.MPI 00000000008DC28C Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002B4C71D1CCF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002B0F6D9C250E Unknown Unknown
> Unknown
> libc.so.6 00002B0F6D99E557 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0F6D083800 Unknown Unknown
> Unknown
> libibverbs.so.1 00002B0F6D08306B Unknown Unknown
> Unknown
> sander.MPI 00000000008DC28C Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002B0F6D930CF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line
> Source
> libc.so.6 00002AB3B918E50E Unknown Unknown
> Unknown
> libc.so.6 00002AB3B9146913 Unknown Unknown
> Unknown
> libc.so.6 00002AB3B9146A6A Unknown Unknown
> Unknown
> libc.so.6 00002AB3B913BBD4 Unknown Unknown
> Unknown
> libibverbs.so.1 00002AB3B884F98A Unknown Unknown
> Unknown
> libibverbs.so.1 00002AB3B884F06B Unknown Unknown
> Unknown
> sander.MPI 00000000008DC28C Unknown Unknown
> Unknown
> sander.MPI 00000000008E7A06 Unknown Unknown
> Unknown
> sander.MPI 00000000008BCCE8 Unknown Unknown
> Unknown
> sander.MPI 00000000008BC936 Unknown Unknown
> Unknown
> sander.MPI 00000000008B9593 Unknown Unknown
> Unknown
> sander.MPI 00000000004BD7D2 Unknown Unknown
> Unknown
> sander.MPI 000000000041F65C Unknown Unknown
> Unknown
> libc.so.6 00002AB3B90FCCF4 Unknown Unknown
> Unknown
> sander.MPI 000000000041F569 Unknown Unknown
> Unknown
> Exit code -1 signaled from node-10-01
> Killing remote processes...PMGR_COLLECTIVE ERROR: reading from (read()
> Connection reset by peer errno=104) . file
> pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> PMGR_COLLECTIVE ERROR: reading from (read() Connection reset by peer
> errno=104) . file pmgr_collective_mpispawn.c:121
> DONE
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jul 08 2009 - 08:37:32 PDT
Custom Search