Re: [AMBER] pmemd.MPI fails to run

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 28 Dec 2014 09:35:59 -0800

Hi Fabian,

It looks like you are not specifying a hostfile (or equivalent) to your
mpirun command and so it will automatically run everything on the nod on
which you launch it from.

>mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p
>../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r
>min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd

You like need to specify the hostfile - probably this is set by PBS and
will be something like $PBS_NODEFILE but the specifics will depend on your
installation.


All the best
Ross


On 12/28/14, 8:31 AM, "Fabian Glaser" <fabian.glaser.gmail.com> wrote:

>Hi Jason,
>
>we have followed your suggestion to recompile amber 14 but our system
>people still having problems compiling and running parallel pmemd.MPI
>amber 14 for more than one node, I forward you our system people detailed
>trials and questions, we would highly appreciate your help and suggestion
>what to do next:
>
>Thanks a lot, and happy new year,
>
>Fabian Glaser
>Technion, Israel
>
>===
>
>Shalom Fabian,
>
>After several compilation attempts the Amber14 has been compiled with
>OpenMPI 1.5.4 (system default).
>
>The PBS script minz.q in your directory
>/u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
>contains the setup that provides the access to the respective "mpirun"
>command. I've submitted the
>job on 2 nodes with
>
>> qsub -I minz.q
>
>- interactive batch command to see the output after the setup:
>
>> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
>> export PATH=/usr/lib64/openmpi/bin:$PATH
>> source /usr/local/amber14/amber.sh
>
>> which mpirun
>mpirun is /usr/lib64/openmpi/bin/mpirun
>
>> mpirun -version
>mpirun (Open MPI) 1.5.4
>
>> which pmemd.MPI
>pmemd.MPI is /usr/local/amber14/bin/pmemd.MPI
>
>> ldd /usr/local/amber14/bin/pmemd.MPI
> linux-vdso.so.1 => (0x00002aaaaaaab000)
> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000039b4400000)
> libmpi_cxx.so.1 => /usr/lib64/openmpi/lib/libmpi_cxx.so.1
>(0x00002aaaaaac1000)
> libmpi_f90.so.1 => /usr/lib64/openmpi/lib/libmpi_f90.so.1
>(0x00002aaaaacdb000)
> libmpi_f77.so.1 => /usr/lib64/openmpi/lib/libmpi_f77.so.1
>(0x00002aaaaaedf000)
> libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1
>(0x00000039b4000000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00000039b2400000)
> libgfortran.so.3 => /usr/lib64/libgfortran.so.3
>(0x00000039b3400000)
> libm.so.6 => /lib64/libm.so.6 (0x00000039b1c00000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000039b3800000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000039b2800000)
> libc.so.6 => /lib64/libc.so.6 (0x00000039b2000000)
> /lib64/ld-linux-x86-64.so.2 (0x00000039b1800000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00000039b4c00000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00000039b3c00000)
> libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00002aaaab115000)
>
>the run command
>
>> mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p
>>../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r
>>min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd
>
>starts execution, and eventually produces the min.out file. However all
>the 24 processes
>are executed on ONE NODE instead of distributing 12 processes on each of
>the TWO nodes.
>> qstat -1nu fglaser
> Req'd Req'd
>Elap
>Job ID Username Queue Jobname SessID NDS TSK Memory Time
>S Time
>--------------- -------- -------- ---------- ------ --- --- ------ -----
>- -----
>1076133.tamnun fglaser amir_q ETA_1_minz 6320 2 24 -- 168:0
>R 00:10 n101/0*12+n102/0*12
>
>> hostname
>n101
>
>> top
>top - 17:42:18 up 13 days, 2:51, 1 user, load average: 9.52, 2.31, 0.77
>Tasks: 378 total, 25 running, 353 sleeping, 0 stopped, 0 zombie
>Cpu(s): 65.8%us, 34.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
>0.0%st
>Mem: 99063900k total, 3033516k used, 96030384k free, 221696k buffers
>Swap: 8191992k total, 0k used, 8191992k free, 864036k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>6452 fglaser 20 0 229m 57m 19m R 59.1 0.1 0:18.88 pmemd.MPI
>6451 fglaser 20 0 230m 58m 20m R 58.4 0.1 0:17.26 pmemd.MPI
>6445 fglaser 20 0 230m 58m 21m R 56.4 0.1 0:18.26 pmemd.MPI
>6461 fglaser 20 0 230m 56m 18m R 56.4 0.1 0:18.21 pmemd.MPI
>6449 fglaser 20 0 230m 57m 20m R 56.1 0.1 0:17.58 pmemd.MPI
>6459 fglaser 20 0 230m 56m 19m R 56.1 0.1 0:18.00 pmemd.MPI
>6460 fglaser 20 0 230m 55m 19m R 54.8 0.1 0:17.30 pmemd.MPI
>6453 fglaser 20 0 229m 57m 20m R 52.5 0.1 0:17.63 pmemd.MPI
>6444 fglaser 20 0 240m 70m 25m R 50.1 0.1 0:17.49 pmemd.MPI
>6448 fglaser 20 0 230m 58m 20m R 50.1 0.1 0:16.04 pmemd.MPI
>6462 fglaser 20 0 230m 55m 18m R 50.1 0.1 0:14.81 pmemd.MPI
>6455 fglaser 20 0 230m 56m 19m R 49.8 0.1 0:15.50 pmemd.MPI
>6457 fglaser 20 0 231m 55m 19m R 49.8 0.1 0:16.08 pmemd.MPI
>6446 fglaser 20 0 230m 58m 21m R 49.5 0.1 0:16.58 pmemd.MPI
>6464 fglaser 20 0 230m 56m 19m R 49.5 0.1 0:17.54 pmemd.MPI
>6450 fglaser 20 0 230m 57m 20m R 48.1 0.1 0:14.53 pmemd.MPI
>6447 fglaser 20 0 230m 58m 21m R 47.8 0.1 0:15.20 pmemd.MPI
>6458 fglaser 20 0 230m 56m 19m R 47.1 0.1 0:14.37 pmemd.MPI
>6454 fglaser 20 0 230m 56m 19m R 46.5 0.1 0:13.97 pmemd.MPI
>6466 fglaser 20 0 230m 56m 19m R 45.8 0.1 0:14.74 pmemd.MPI
>6456 fglaser 20 0 230m 56m 19m R 41.8 0.1 0:14.55 pmemd.MPI
>6467 fglaser 20 0 230m 57m 20m R 41.8 0.1 0:14.36 pmemd.MPI
>6463 fglaser 20 0 230m 55m 18m R 40.2 0.1 0:14.40 pmemd.MPI
>6465 fglaser 20 0 230m 56m 19m R 39.8 0.1 0:14.84 pmemd.MPI
> 55 root 20 0 0 0 0 S 0.3 0.0 0:20.28 events/4
>6570 fglaser 20 0 13396 1408 896 R 0.3 0.0 0:00.03 top
> 1 root 20 0 23588 1660 1312 S 0.0 0.0 0:02.66 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
>..........................................................................
>..
>> hostname
>n102
>> top
>top - 17:46:09 up 13 days, 2:55, 1 user, load average: 0.07, 0.03, 0.00
>Tasks: 349 total, 1 running, 348 sleeping, 0 stopped, 0 zombie
>Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si,
>0.0%st
>Mem: 99063900k total, 1962460k used, 97101440k free, 220228k buffers
>Swap: 8191992k total, 0k used, 8191992k free, 746280k cached
>PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 182 root 39 19 0 0 0 S 0.3 0.0 47:28.09 kipmi0
> 1 root 20 0 23592 1660 1312 S 0.0 0.0 0:02.64 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
> 4 root 20 0 0 0 0 S 0.0 0.0 0:00.36 ksoftirqd/0
> 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
> 6 root RT 0 0 0 0 S 0.0 0.0 0:00.80 watchdog/0
> 7 root RT 0 0 0 0 S 0.0 0.0 0:00.75 migration/1
> 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
>..........................................................................
>....
>
>As you understand, the parallel execution in this way is not effective.
>I suggest to send this output to the AMBER support/forum and ask for their
>recommendations.
>
>To save time, I should mention that we have encountered apparently
>similar problem with
>a couple of other applications. At that time the problem was solved by
>recompiling
>and running with Intel MPI. Can AMBER 14 work with Intel MPI generally?
>Meanwhile the attempt of compilation with Intel MPI (Intel version
>14.0.2) has failed.
>
>Any recommendations would be deeply appreciated.
>
>Regards,
>Yulia Halupovich,
>Technion - CIS, TAMNUN Team
>phone: 972-4-8292654, fax: 972-4-8236212
>Reply-to: hpc.technion.ac.il
>
>
>
>
>
>
>_______________________________
>Fabian Glaser, PhD
>
>Technion - Israel Institute of Technology
>Haifa 32000, ISRAEL
>
>fglaser.technion.ac.il
>Tel: +972 4 8293701
>Fax: +972 4 8225153
>
>> On Dec 21, 2014, at 5:39 PM, Jason Swails <jason.swails.gmail.com>
>>wrote:
>>
>> On Sun, Dec 21, 2014 at 9:02 AM, Fabian Glaser <fabian.glaser.gmail.com>
>> wrote:
>>
>>> Hi Amber experts,
>>>
>>>
>>> We had pmemd.MPI (amber 14) installed correctly and running, but after
>>>a
>>> disk addition to our cluster it fails to run, we try to run pmemd.MPI
>>>with
>>> the following setup
>>>
>>>> source /usr/local/amber14/setup.csh
>>> which contains the following definitions
>>>
>>>> more /usr/local/amber14/setup.csh
>>> #!/bin/csh -f
>>> #
>>> # Setup for Amber 14
>>> #
>>> setenv AMBERHOME /usr/local/amber14
>>> setenv PATH $AMBERHOME/bin:$PATH
>>> setenv LD_LIBRARY_PATH $AMBERHOME/lib:$LD_LIBRARY_PATH
>>>
>>> and sets the MPI path as follows:
>>>
>>>> which mpirun
>>> /usr/local/amber14/bin/mpirun
>>>
>>> we get the following error message:
>>>
>>>> mpirun -np 12 pmemd.MPI
>>> -------------------------------------------------------
>>> Primary job terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> MPI version of PMEMD must be used with 2 or more processors!
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>
>>> it seems that the "numtasks" parameter is not passed to the pmemd.MPI
>>> executable.
>>>
>>> We would quite appreciate any help, and are ready to provide any
>>>additional
>>> information regarding the AMBER 14 installation and compilation on our
>>> system.
>>>
>>> All the other non-MPI programs including pmemd or sander run fine.
>>>
>>
>> ​This happens when you compile Amber with one MPI and try to use the
>> "mpirun" from a different MPI. This is true of every MPI program, not
>>just
>> Amber.
>>
>> You need to make sure that Amber is compiled with the same MPI
>>installation
>> you intend on using to run it.
>>
>> ​HTH,
>> Jason
>> ​
>> --
>> Jason M. Swails
>> BioMaPS,
>> Rutgers University
>> Postdoctoral Researcher
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 28 2014 - 10:00:02 PST
Custom Search