Re: [AMBER] Building parallel Amber11 on CRAY XD1

From: Dean Cuebas <deancuebas.missouristate.edu>
Date: Fri, 28 Oct 2011 15:13:43 -0500

Dear Ross,

Sorry I took so long to respond, but you right dead on with your
suggestion regarding MPI environment variable.

Thanks very much for your help!

Dean

On 10/25/11 1:57 PM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

>Hi Dean,
>
>This looks like something is very wrong with the way in which mpi jobs are
>being run on your machine. I would start by seeing if you can run any
>simple
>MPI test programs such as a ping pong or bandwidth test. These should be
>included with your MPI implementation. You'll need to get these working
>first before attempting to run pmemd.MPI or sander.MPI. It looks like your
>mpirun command is just executing 16 copies of pmemd.MPI outside of an MPI
>environment. One guess would be that the MPI used to compile pmemd.MPI is
>not the same MPI that corresponds to the mpirun command you are using. I
>would check this. It is possible your environment variables, particularly
>your PATH, are not being inherited properly inside the qsub job.
>
>With regards to the serial pmemd error PMEMD v11 and earlier does NOT
>support vacuum simulations. Only PME and GB simulations are supported.
>Although the code should have exited with a more appropriate error. I'll
>need to check where this is tested for.
>
>All the best
>Ross
>
>> -----Original Message-----
>> From: Dean Cuebas [mailto:deancuebas.missouristate.edu]
>> Sent: Tuesday, October 25, 2011 11:31 AM
>> To: AMBER Mailing List
>> Subject: [AMBER] Building parallel Amber11 on CRAY XD1
>>
>> Hi amber people,
>>
>> My IT guy has sander serial installed ok, and he say he just installed
>> parallel amber11.
>>
>> Serial sander runs fine at the command line with the input files shown
>> below.
>>
>> My input script for pmemd.MPI is:
>> _______________________________
>> #!/bin/bash
>> # qsub -cwd -pe am.mpi 16 -l pn=compute test.sh
>> mpirun -np $NSLOTS -hostfile $TMPDIR/machines
>> /var/amber11/bin/pmemd.MPI
>> -O \
>> -i test.in \
>> -o test1.out -p ben57mp2.top -c test.crd \
>> -r test1.rst -x test1.mdcrd
>> ___________________________________
>>
>>
>> I submit the job on the command line as follows:
>>
>> > qsub -cwd -pe am.mpi 16 -l pn=compute test.sh
>>
>> The command line says the job was submitted.
>>
>> Standard error and output files are created with the job not having
>> run:
>> _________________________________________________
>> -catch_rsh
>> /opt/gridengine/default/spool/hal9000-274-
>> 3/active_jobs/11.1/pe_hostfile
>> hal9000-274-3
>> hal9000-274-3
>> hal9000-274-3
>> hal9000-274-3
>> hal9000-274-4
>> hal9000-274-4
>> hal9000-274-4
>> hal9000-274-4
>> hal9000-274-5
>> hal9000-274-5
>> hal9000-274-5
>> hal9000-274-5
>> hal9000-274-6
>> hal9000-274-6
>> hal9000-274-6
>> hal9000-274-6
>> _______________________________________________
>>
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> ___________________________________________________________
>>
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-3 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=2 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-4 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=5 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-6 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=13 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-5 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=10 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-4 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=6 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-6 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=15 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-4 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=4 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-4 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=7 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-6 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=12 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-3 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=1 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-5 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=8 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-6 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=14 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-3 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=0 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-5 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=9 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-3 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=3 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> /opt/gridengine/bin/lx26-amd64/qrsh -inherit hal9000-274-5 cd
>> /home/dcuebas/Documents/Amber; /usr/bin/env MPIRUN_HOST=hal9000-274-3
>> MPIRUN_PORT=39164 MPIRUN_RANK=11 MPIRUN_NPROCS=16 MPIRUN_ID=6107
>> RAIDEV_DEVICE=/dev/rai_hbx0 /var/amber11/bin/pmemd.MPI "-O" "-i"
>> "test.in"
>> "-o" "test1.out" "-p" "ben57mp2.top" "-c" "test.crd" "-r" "test1.rst"
>> "-x"
>> "test1.mdcrd"
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> _______________________________________________________________
>>
>>
>>
>>
>> Interestingly, running serial pmemd (NOT pmemd.MPI) on the command
>> line:
>>
>> >var/amber11/bin/pmemd -O \
>> -i test.in \
>> -o test1.out -p ben57mp2.top -c test.crd \
>> -r test1.rst -x test1.mdcrd
>>
>> Gives the following .out file
>> -------------------------------------------------------
>> Amber 11 SANDER 2010
>> -------------------------------------------------------
>>
>> | PMEMD implementation of SANDER, Release 11
>>
>> | Run on 10/25/2011 at 07:11:12
>>
>> [-O]verwriting output
>>
>> File Assignments:
>> | MDIN: test.in
>>
>> | MDOUT: test1.out
>>
>> | INPCRD: test.crd
>>
>> | PARM: ben57mp2.top
>>
>> | RESTRT: test1.rst
>>
>> | REFC: refc
>>
>> | MDVEL: mdvel
>>
>> | MDEN: mden
>>
>> | MDCRD: test1.mdcrd
>>
>> | MDINFO: mdinfo
>>
>>
>>
>> Here is the input file:
>>
>> Vacuum simulation at 300ąC, weak coupling
>>
>> &cntrl
>>
>> imin = 0, ntb = 0, ig=-1,
>>
>> igb = 0, ntpr = 10, ntwx = 10,
>>
>> ntt = 1, tautp=0.5, gamma_ln = 0,
>>
>> tempi = 300.0, temp0 = 300.0
>>
>> nstlim = 1000000, dt = 0.0001,
>>
>> cut = 999
>>
>> /
>>
>>
>>
>>
>> | ERROR: nfft1 must be in the range of 6 to 512!
>> | ERROR: nfft2 must be in the range of 6 to 512!
>> | ERROR: nfft3 must be in the range of 6 to 512!
>> | ERROR: a must be in the range of 0.10000E+01 to 0.10000E+04!
>> | ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
>> | ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
>>
>> Input errors occurred. Terminating execution.
>> _____________________________________________________________
>>
>>
>> Does anyone have any suggestions? I would greatly appreciate it!!!!!
>>
>> Thanks a million in advance.
>>
>> Dean
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 28 2011 - 13:30:03 PDT
Custom Search