Re: [AMBER] pmemd.MPI fails to run

From: Fabian Glaser <fabian.glaser.gmail.com>
Date: Sun, 28 Dec 2014 18:31:25 +0200

Hi Jason,

we have followed your suggestion to recompile amber 14 but our system people still having problems compiling and running parallel pmemd.MPI amber 14 for more than one node, I forward you our system people detailed trials and questions, we would highly appreciate your help and suggestion what to do next:

Thanks a lot, and happy new year,

Fabian Glaser
Technion, Israel

===

Shalom Fabian,

After several compilation attempts the Amber14 has been compiled with
OpenMPI 1.5.4 (system default).

The PBS script minz.q in your directory /u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
contains the setup that provides the access to the respective "mpirun" command. I've submitted the
job on 2 nodes with

> qsub -I minz.q

- interactive batch command to see the output after the setup:

> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
> export PATH=/usr/lib64/openmpi/bin:$PATH
> source /usr/local/amber14/amber.sh

> which mpirun
mpirun is /usr/lib64/openmpi/bin/mpirun

> mpirun -version
mpirun (Open MPI) 1.5.4

> which pmemd.MPI
pmemd.MPI is /usr/local/amber14/bin/pmemd.MPI

> ldd /usr/local/amber14/bin/pmemd.MPI
       linux-vdso.so.1 => (0x00002aaaaaaab000)
       libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000039b4400000)
       libmpi_cxx.so.1 => /usr/lib64/openmpi/lib/libmpi_cxx.so.1 (0x00002aaaaaac1000)
       libmpi_f90.so.1 => /usr/lib64/openmpi/lib/libmpi_f90.so.1 (0x00002aaaaacdb000)
       libmpi_f77.so.1 => /usr/lib64/openmpi/lib/libmpi_f77.so.1 (0x00002aaaaaedf000)
       libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00000039b4000000)
       libdl.so.2 => /lib64/libdl.so.2 (0x00000039b2400000)
       libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00000039b3400000)
       libm.so.6 => /lib64/libm.so.6 (0x00000039b1c00000)
       libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000039b3800000)
       libpthread.so.0 => /lib64/libpthread.so.0 (0x00000039b2800000)
       libc.so.6 => /lib64/libc.so.6 (0x00000039b2000000)
       /lib64/ld-linux-x86-64.so.2 (0x00000039b1800000)
       libnsl.so.1 => /lib64/libnsl.so.1 (0x00000039b4c00000)
       libutil.so.1 => /lib64/libutil.so.1 (0x00000039b3c00000)
       libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00002aaaab115000)

the run command

> mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p ../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd

starts execution, and eventually produces the min.out file. However all the 24 processes
are executed on ONE NODE instead of distributing 12 processes on each of the TWO nodes.
> qstat -1nu fglaser
                                                           Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1076133.tamnun fglaser amir_q ETA_1_minz 6320 2 24 -- 168:0 R 00:10 n101/0*12+n102/0*12

> hostname
n101

> top
top - 17:42:18 up 13 days, 2:51, 1 user, load average: 9.52, 2.31, 0.77
Tasks: 378 total, 25 running, 353 sleeping, 0 stopped, 0 zombie
Cpu(s): 65.8%us, 34.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99063900k total, 3033516k used, 96030384k free, 221696k buffers
Swap: 8191992k total, 0k used, 8191992k free, 864036k cached
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6452 fglaser 20 0 229m 57m 19m R 59.1 0.1 0:18.88 pmemd.MPI
6451 fglaser 20 0 230m 58m 20m R 58.4 0.1 0:17.26 pmemd.MPI
6445 fglaser 20 0 230m 58m 21m R 56.4 0.1 0:18.26 pmemd.MPI
6461 fglaser 20 0 230m 56m 18m R 56.4 0.1 0:18.21 pmemd.MPI
6449 fglaser 20 0 230m 57m 20m R 56.1 0.1 0:17.58 pmemd.MPI
6459 fglaser 20 0 230m 56m 19m R 56.1 0.1 0:18.00 pmemd.MPI
6460 fglaser 20 0 230m 55m 19m R 54.8 0.1 0:17.30 pmemd.MPI
6453 fglaser 20 0 229m 57m 20m R 52.5 0.1 0:17.63 pmemd.MPI
6444 fglaser 20 0 240m 70m 25m R 50.1 0.1 0:17.49 pmemd.MPI
6448 fglaser 20 0 230m 58m 20m R 50.1 0.1 0:16.04 pmemd.MPI
6462 fglaser 20 0 230m 55m 18m R 50.1 0.1 0:14.81 pmemd.MPI
6455 fglaser 20 0 230m 56m 19m R 49.8 0.1 0:15.50 pmemd.MPI
6457 fglaser 20 0 231m 55m 19m R 49.8 0.1 0:16.08 pmemd.MPI
6446 fglaser 20 0 230m 58m 21m R 49.5 0.1 0:16.58 pmemd.MPI
6464 fglaser 20 0 230m 56m 19m R 49.5 0.1 0:17.54 pmemd.MPI
6450 fglaser 20 0 230m 57m 20m R 48.1 0.1 0:14.53 pmemd.MPI
6447 fglaser 20 0 230m 58m 21m R 47.8 0.1 0:15.20 pmemd.MPI
6458 fglaser 20 0 230m 56m 19m R 47.1 0.1 0:14.37 pmemd.MPI
6454 fglaser 20 0 230m 56m 19m R 46.5 0.1 0:13.97 pmemd.MPI
6466 fglaser 20 0 230m 56m 19m R 45.8 0.1 0:14.74 pmemd.MPI
6456 fglaser 20 0 230m 56m 19m R 41.8 0.1 0:14.55 pmemd.MPI
6467 fglaser 20 0 230m 57m 20m R 41.8 0.1 0:14.36 pmemd.MPI
6463 fglaser 20 0 230m 55m 18m R 40.2 0.1 0:14.40 pmemd.MPI
6465 fglaser 20 0 230m 56m 19m R 39.8 0.1 0:14.84 pmemd.MPI
  55 root 20 0 0 0 0 S 0.3 0.0 0:20.28 events/4
6570 fglaser 20 0 13396 1408 896 R 0.3 0.0 0:00.03 top
   1 root 20 0 23588 1660 1312 S 0.0 0.0 0:02.66 init
   2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
   3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
............................................................................
> hostname
n102
> top
top - 17:46:09 up 13 days, 2:55, 1 user, load average: 0.07, 0.03, 0.00
Tasks: 349 total, 1 running, 348 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99063900k total, 1962460k used, 97101440k free, 220228k buffers
Swap: 8191992k total, 0k used, 8191992k free, 746280k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 182 root 39 19 0 0 0 S 0.3 0.0 47:28.09 kipmi0
   1 root 20 0 23592 1660 1312 S 0.0 0.0 0:02.64 init
   2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
   3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
   4 root 20 0 0 0 0 S 0.0 0.0 0:00.36 ksoftirqd/0
   5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
   6 root RT 0 0 0 0 S 0.0 0.0 0:00.80 watchdog/0
   7 root RT 0 0 0 0 S 0.0 0.0 0:00.75 migration/1
   8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
..............................................................................

As you understand, the parallel execution in this way is not effective.
I suggest to send this output to the AMBER support/forum and ask for their
recommendations.

To save time, I should mention that we have encountered apparently similar problem with
a couple of other applications. At that time the problem was solved by recompiling
and running with Intel MPI. Can AMBER 14 work with Intel MPI generally?
Meanwhile the attempt of compilation with Intel MPI (Intel version 14.0.2) has failed.

Any recommendations would be deeply appreciated.

Regards,
Yulia Halupovich,
Technion - CIS, TAMNUN Team
phone: 972-4-8292654, fax: 972-4-8236212
Reply-to: hpc.technion.ac.il






_______________________________
Fabian Glaser, PhD

Technion - Israel Institute of Technology
Haifa 32000, ISRAEL

fglaser.technion.ac.il
Tel: +972 4 8293701
Fax: +972 4 8225153

> On Dec 21, 2014, at 5:39 PM, Jason Swails <jason.swails.gmail.com> wrote:
>
> On Sun, Dec 21, 2014 at 9:02 AM, Fabian Glaser <fabian.glaser.gmail.com>
> wrote:
>
>> Hi Amber experts,
>>
>>
>> We had pmemd.MPI (amber 14) installed correctly and running, but after a
>> disk addition to our cluster it fails to run, we try to run pmemd.MPI with
>> the following setup
>>
>>> source /usr/local/amber14/setup.csh
>> which contains the following definitions
>>
>>> more /usr/local/amber14/setup.csh
>> #!/bin/csh -f
>> #
>> # Setup for Amber 14
>> #
>> setenv AMBERHOME /usr/local/amber14
>> setenv PATH $AMBERHOME/bin:$PATH
>> setenv LD_LIBRARY_PATH $AMBERHOME/lib:$LD_LIBRARY_PATH
>>
>> and sets the MPI path as follows:
>>
>>> which mpirun
>> /usr/local/amber14/bin/mpirun
>>
>> we get the following error message:
>>
>>> mpirun -np 12 pmemd.MPI
>> -------------------------------------------------------
>> Primary job terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> MPI version of PMEMD must be used with 2 or more processors!
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> it seems that the "numtasks" parameter is not passed to the pmemd.MPI
>> executable.
>>
>> We would quite appreciate any help, and are ready to provide any additional
>> information regarding the AMBER 14 installation and compilation on our
>> system.
>>
>> All the other non-MPI programs including pmemd or sander run fine.
>>
>
> ​This happens when you compile Amber with one MPI and try to use the
> "mpirun" from a different MPI. This is true of every MPI program, not just
> Amber.
>
> You need to make sure that Amber is compiled with the same MPI installation
> you intend on using to run it.
>
> ​HTH,
> Jason
> ​
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 28 2014 - 09:00:03 PST
Custom Search