Re: sander6/Linux compiled with pgf77 + mpipro

From: Jarrod Smith <jsmith_at_structbio.vanderbilt.edu>
Date: Thu 03 May 2001 14:41:06 -0500

Thanks to everyone who sent private mail. To give a little more detail, this binary works fine with
one processor, but with two or more it seems to die when it tries to open mdinfo (unit 7).
Permissions and all that are fine (the same script works fine with np=1). The mpipro install seems
OK, because it works fine with g77. I've tried all the suggestions I received about my MACHINE
file, and even went to a completely fresh machine with a brand new install of RH7.1 (originally I
was on RH6.2), mpipro and pgi compilers. Same behavior every time.

My original mpirun command was simply "mpirun -np 2". I've since added more options while trying to
figure this out:

mpirun -np 2 -wd ~/6bench/dhfr -verbose -mpi_debug -mpi_verbose

Here is the output of that in case it generates any more suggestions:

mpirun: $np = 2
mpirun: Reading machine file: /etc/machines
mpirun: Adding machine: piranha
mpirun: user command: /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o
dhfr.Intel-866-pgf77-mpipro_2.out -r /tmp/restrt.dhfr_2
mpirun: using port 37646
mpirun: opened a shell on piranha
+ cd /home/jsmith/6bench/dhfr
+ export MSTI_SIZE=2
+ MSTI_SIZE=2
+ export MSTI_RANK=0
+ MSTI_RANK=0
+ export MSTI_GM_CUTOFF=16344
+ MSTI_GM_CUTOFF=16344
+ export MSTI_PORT=37646
+ MSTI_PORT=37646
+ export MSTI_ROOT_NAME=piranha
+ MSTI_ROOT_NAME=piranha
+ export MSTI_GM_POLL_MODE=POLLING
+ MSTI_GM_POLL_MODE=POLLING
+ export MSTI_VIA_BUFFERS=16
+ MSTI_VIA_BUFFERS=16
+ export MSTI_ALLGATHER_CUTOFF=8000
+ MSTI_ALLGATHER_CUTOFF=8000
+ export MSTI_DEBUG=1
+ MSTI_DEBUG=1
+ export MSTI_TCP_CUTOFF=32768
+ MSTI_TCP_CUTOFF=32768
+ export MSTI_VERBOSE=1
+ MSTI_VERBOSE=1
+ export MSTI_SMP_CUTOFF=8192
+ MSTI_SMP_CUTOFF=8192
+ export MSTI_SMP_LIST=
+ MSTI_SMP_LIST=
+ export MSTI_TCP_BUFFERS=1024
+ MSTI_TCP_BUFFERS=1024
+ export MSTI_VIA_CUTOFF=8192
+ MSTI_VIA_CUTOFF=8192
+ export 'MSTI_TCP_LIST=1 '
+ MSTI_TCP_LIST=1
+ exec /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o dhfr.Intel-866-pgf77-mpipro_2.out
-r /tmp/restrt.dhfr_2
MPI/Pro: [0] MPI_Init: NumPipeConns=0
MPI/Pro: [0] MPI_Init: NumTcpConns=1
MPI/Pro: [0] MPI_Init: TCP cutoff size = 32768
MPI/Pro: [0] MPI_Init: TCP buffers = 1024
MPI/Pro: [0] MPI_Init: Setup timeout = 60
MPI/Pro: [0] MPI_Init: Release = $Name: $ $Date: 2001/01/25 00:13:06 $
MPI/Pro: [0] MPI_Init: Rank 0 waiting for a connection.
mpirun: opened a shell on piranha
mpirun: Process 15220 is copying stdin
mpirun: waiting for pid 15202
+ cd /home/jsmith/6bench/dhfr
+ export MSTI_SIZE=2
+ MSTI_SIZE=2
+ export MSTI_RANK=1
+ MSTI_RANK=1
+ export MSTI_GM_CUTOFF=16344
+ MSTI_GM_CUTOFF=16344
+ export MSTI_PORT=37646
+ MSTI_PORT=37646
+ export MSTI_ROOT_NAME=piranha
+ MSTI_ROOT_NAME=piranha
+ export MSTI_GM_POLL_MODE=POLLING
+ MSTI_GM_POLL_MODE=POLLING
+ export MSTI_VIA_BUFFERS=16
+ MSTI_VIA_BUFFERS=16
+ export MSTI_ALLGATHER_CUTOFF=8000
+ MSTI_ALLGATHER_CUTOFF=8000
+ export MSTI_TCP_CUTOFF=32768
+ MSTI_TCP_CUTOFF=32768
+ export MSTI_DEBUG=1
+ MSTI_DEBUG=1
+ export MSTI_SMP_CUTOFF=8192
+ MSTI_SMP_CUTOFF=8192
+ export MSTI_VERBOSE=1
+ MSTI_VERBOSE=1
+ export MSTI_SMP_LIST=
+ MSTI_SMP_LIST=
+ export MSTI_TCP_BUFFERS=1024
+ MSTI_TCP_BUFFERS=1024
+ export 'MSTI_TCP_LIST=0 '
+ MSTI_TCP_LIST=0
+ export MSTI_VIA_CUTOFF=8192
+ MSTI_VIA_CUTOFF=8192
+ exec /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o dhfr.Intel-866-pgf77-mpipro_2.out
-r /tmp/restrt.dhfr_2
MPI/Pro: [1] MPI_Init: NumPipeConns=0
MPI/Pro: [0] MPI_Init: Rank 0 connected.
MPI/Pro: [0] MPI_Init: Received info from rank 1
MPI/Pro: [0] MPI_Init: Message handles = 1024
MPI/Pro: [0] MPI_Init: Opening LOOPBACK device.
MPI/Pro: [0] MPI_Init: Opening TCP device.
MPI/Pro: [0] MPI_Init: Established all TCP connections.
MPI/Pro: [0] MPI_Init: Spawned TCP thread.
MPI/Pro: [0] MPI_Init: Spawned Long Send Thread
MPI/Pro: [0] MPI_Init: Done
MPI/Pro: [1] MPI_Init: NumTcpConns=1
MPI/Pro: [1] MPI_Init: Looking for rank 0 at piranha:37646
MPI/Pro: [1] MPI_Init: Connecting to rank 0.
MPI/Pro: [1] MPI_Init: Sending my connection info.
MPI/Pro: [1] MPI_Init: Receiving connection table.
MPI/Pro: [1] MPI_Init: Message handles = 1024
MPI/Pro: [1] MPI_Init: Opening LOOPBACK device.
MPI/Pro: [1] MPI_Init: Opening TCP device.
MPI/Pro: [1] MPI_Init: Established all TCP connections.
MPI/Pro: [1] MPI_Init: Spawned TCP thread.
MPI/Pro: [1] MPI_Init: Spawned Long Send Thread
MPI/Pro: [1] MPI_Init: Done
| Atom division among processors:
| 0 11467 22930
| Ampirun: $np = 2
mpirun: Reading machine file: /etc/machines
mpirun: Adding machine: piranha
mpirun: user command: /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o
dhfr.Intel-866-pgf77-mpipro_2.out -r /tmp/restrt.dhfr_2
mpirun: using port 37646
mpirun: opened a shell on piranha
+ cd /home/jsmith/6bench/dhfr
+ export MSTI_SIZE=2
+ MSTI_SIZE=2
+ export MSTI_RANK=0
+ MSTI_RANK=0
+ export MSTI_GM_CUTOFF=16344
+ MSTI_GM_CUTOFF=16344
+ export MSTI_PORT=37646
+ MSTI_PORT=37646
+ export MSTI_ROOT_NAME=piranha
+ MSTI_ROOT_NAME=piranha
+ export MSTI_GM_POLL_MODE=POLLING
+ MSTI_GM_POLL_MODE=POLLING
+ export MSTI_VIA_BUFFERS=16
+ MSTI_VIA_BUFFERS=16
+ export MSTI_ALLGATHER_CUTOFF=8000
+ MSTI_ALLGATHER_CUTOFF=8000
+ export MSTI_DEBUG=1
+ MSTI_DEBUG=1
+ export MSTI_TCP_CUTOFF=32768
+ MSTI_TCP_CUTOFF=32768
+ export MSTI_VERBOSE=1
+ MSTI_VERBOSE=1
+ export MSTI_SMP_CUTOFF=8192
+ MSTI_SMP_CUTOFF=8192
+ export MSTI_SMP_LIST=
+ MSTI_SMP_LIST=
+ export MSTI_TCP_BUFFERS=1024
+ MSTI_TCP_BUFFERS=1024
+ export MSTI_VIA_CUTOFF=8192
+ MSTI_VIA_CUTOFF=8192
+ export 'MSTI_TCP_LIST=1 '
+ MSTI_TCP_LIST=1
+ exec /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o dhfr.Intel-866-pgf77-mpipro_2.out
-r /tmp/restrt.dhfr_2
MPI/Pro: [0] MPI_Init: NumPipeConns=0
MPI/Pro: [0] MPI_Init: NumTcpConns=1
MPI/Pro: [0] MPI_Init: TCP cutoff size = 32768
MPI/Pro: [0] MPI_Init: TCP buffers = 1024
MPI/Pro: [0] MPI_Init: Setup timeout = 60
MPI/Pro: [0] MPI_Init: Release = $Name: $ $Date: 2001/01/25 00:13:06 $
MPI/Pro: [0] MPI_Init: Rank 0 waiting for a connection.
mpirun: opened a shell on piranha
mpirun: Process 15220 is copying stdin
mpirun: waiting for pid 15202
+ cd /home/jsmith/6bench/dhfr
+ export MSTI_SIZE=2
+ MSTI_SIZE=2
+ export MSTI_RANK=1
+ MSTI_RANK=1
+ export MSTI_GM_CUTOFF=16344
+ MSTI_GM_CUTOFF=16344
+ export MSTI_PORT=37646
+ MSTI_PORT=37646
+ export MSTI_ROOT_NAME=piranha
+ MSTI_ROOT_NAME=piranha
+ export MSTI_GM_POLL_MODE=POLLING
+ MSTI_GM_POLL_MODE=POLLING
+ export MSTI_VIA_BUFFERS=16
+ MSTI_VIA_BUFFERS=16
+ export MSTI_ALLGATHER_CUTOFF=8000
+ MSTI_ALLGATHER_CUTOFF=8000
+ export MSTI_TCP_CUTOFF=32768
+ MSTI_TCP_CUTOFF=32768
+ export MSTI_DEBUG=1
+ MSTI_DEBUG=1
+ export MSTI_SMP_CUTOFF=8192
+ MSTI_SMP_CUTOFF=8192
+ export MSTI_VERBOSE=1
+ MSTI_VERBOSE=1
+ export MSTI_SMP_LIST=
+ MSTI_SMP_LIST=
+ export MSTI_TCP_BUFFERS=1024
+ MSTI_TCP_BUFFERS=1024
+ export 'MSTI_TCP_LIST=0 '
+ MSTI_TCP_LIST=0
+ export MSTI_VIA_CUTOFF=8192
+ MSTI_VIA_CUTOFF=8192
+ exec /sb/apps/Linux2/amber6/exe/sander -O -i mdin -c md12.x -o dhfr.Intel-866-pgf77-mpipro_2.out
-r /tmp/restrt.dhfr_2
MPI/Pro: [1] MPI_Init: NumPipeConns=0
MPI/Pro: [0] MPI_Init: Rank 0 connected.
MPI/Pro: [0] MPI_Init: Received info from rank 1
MPI/Pro: [0] MPI_Init: Message handles = 1024
MPI/Pro: [0] MPI_Init: Opening LOOPBACK device.
MPI/Pro: [0] MPI_Init: Opening TCP device.
MPI/Pro: [0] MPI_Init: Established all TCP connections.
MPI/Pro: [0] MPI_Init: Spawned TCP thread.
MPI/Pro: [0] MPI_Init: Spawned Long Send Thread
MPI/Pro: [0] MPI_Init: Done
MPI/Pro: [1] MPI_Init: NumTcpConns=1
MPI/Pro: [1] MPI_Init: Looking for rank 0 at piranha:37646
MPI/Pro: [1] MPI_Init: Connecting to rank 0.
MPI/Pro: [1] MPI_Init: Sending my connection info.
MPI/Pro: [1] MPI_Init: Receiving connection table.
MPI/Pro: [1] MPI_Init: Message handles = 1024
MPI/Pro: [1] MPI_Init: Opening LOOPBACK device.
MPI/Pro: [1] MPI_Init: Opening TCP device.
MPI/Pro: [1] MPI_Init: Established all TCP connections.
MPI/Pro: [1] MPI_Init: Spawned TCP thread.
MPI/Pro: [1] MPI_Init: Spawned Long Send Thread
MPI/Pro: [1] MPI_Init: Done
| Atom division among processors:
| 0 11467 22930
| Atom division among processors for gb:
| 0 11465 22930
| Running AMBER/MPI version on 2 nodes


     Sum of charges from parm topology file = -0.00000006
     Forcing neutrality...
 ---------------------------------------------------
 APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
 using 5000.0 points per unit in tabled values
 TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err = 0.2763E-14 at 2.626100
| CHECK d/dx switch(x): max rel err = 0.7617E-11 at 2.788840
 ---------------------------------------------------
     Total number of mask terms = 34070
     Total number of mask terms = 68140
| Total Ewald setup time = 0.16000000
 ------------------------------------------------------------------------------


  Unit 7 Error on OPEN:

  Unit 7 Error on OPEN:
MPI/Pro: [0] : Abort signal received from rank #1. MPI terminated.
mpirun: waiting for pid 15219
mpirun: killing stdin task: 15220




-- 
Jarrod A. Smith
Research Asst. Professor, Biochemistry
Asst. Director, Center for Structural Biology
Computation and Molecular Graphics
Vanderbilt University
jsmith_at_structbio.vanderbilt.edu
Received on Thu May 03 2001 - 12:41:06 PDT
Custom Search