Re: [AMBER] pmemd.MPI fails to run

From: Fabian <fabian.glaser.gmail.com>
Date: Sun, 28 Dec 2014 20:42:25 +0200

Thanks Ross,

I am not sure about it we have successfully run amber 14 from PBS without any PBS_NODEFILE variable, but I will try to use it.

What about Intel MPI?

>> Can AMBER 14 work with Intel MPI generally?


>> Meanwhile the attempt of compilation with Intel MPI (Intel version
>> 14.0.2) has failed.

Thanks!!

Fabian


Sent from my iPhone

> On 28 בדצמ 2014, at 19:35, Ross Walker <ross.rosswalker.co.uk> wrote:
>
> Hi Fabian,
>
> It looks like you are not specifying a hostfile (or equivalent) to your
> mpirun command and so it will automatically run everything on the nod on
> which you launch it from.
>
>> mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p
>> ../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r
>> min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd
>
> You like need to specify the hostfile - probably this is set by PBS and
> will be something like $PBS_NODEFILE but the specifics will depend on your
> installation.
>
>
> All the best
> Ross
>
>
>> On 12/28/14, 8:31 AM, "Fabian Glaser" <fabian.glaser.gmail.com> wrote:
>>
>> Hi Jason,
>>
>> we have followed your suggestion to recompile amber 14 but our system
>> people still having problems compiling and running parallel pmemd.MPI
>> amber 14 for more than one node, I forward you our system people detailed
>> trials and questions, we would highly appreciate your help and suggestion
>> what to do next:
>>
>> Thanks a lot, and happy new year,
>>
>> Fabian Glaser
>> Technion, Israel
>>
>> ===
>>
>> Shalom Fabian,
>>
>> After several compilation attempts the Amber14 has been compiled with
>> OpenMPI 1.5.4 (system default).
>>
>> The PBS script minz.q in your directory
>> /u/fglaser/projects/IsraelVlodavsky/hep1_v2/MD/ETA_1/min
>> contains the setup that provides the access to the respective "mpirun"
>> command. I've submitted the
>> job on 2 nodes with
>>
>>> qsub -I minz.q
>>
>> - interactive batch command to see the output after the setup:
>>
>>> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
>>> export PATH=/usr/lib64/openmpi/bin:$PATH
>>> source /usr/local/amber14/amber.sh
>>
>>> which mpirun
>> mpirun is /usr/lib64/openmpi/bin/mpirun
>>
>>> mpirun -version
>> mpirun (Open MPI) 1.5.4
>>
>>> which pmemd.MPI
>> pmemd.MPI is /usr/local/amber14/bin/pmemd.MPI
>>
>>> ldd /usr/local/amber14/bin/pmemd.MPI
>> linux-vdso.so.1 => (0x00002aaaaaaab000)
>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000039b4400000)
>> libmpi_cxx.so.1 => /usr/lib64/openmpi/lib/libmpi_cxx.so.1
>> (0x00002aaaaaac1000)
>> libmpi_f90.so.1 => /usr/lib64/openmpi/lib/libmpi_f90.so.1
>> (0x00002aaaaacdb000)
>> libmpi_f77.so.1 => /usr/lib64/openmpi/lib/libmpi_f77.so.1
>> (0x00002aaaaaedf000)
>> libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1
>> (0x00000039b4000000)
>> libdl.so.2 => /lib64/libdl.so.2 (0x00000039b2400000)
>> libgfortran.so.3 => /usr/lib64/libgfortran.so.3
>> (0x00000039b3400000)
>> libm.so.6 => /lib64/libm.so.6 (0x00000039b1c00000)
>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000039b3800000)
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000039b2800000)
>> libc.so.6 => /lib64/libc.so.6 (0x00000039b2000000)
>> /lib64/ld-linux-x86-64.so.2 (0x00000039b1800000)
>> libnsl.so.1 => /lib64/libnsl.so.1 (0x00000039b4c00000)
>> libutil.so.1 => /lib64/libutil.so.1 (0x00000039b3c00000)
>> libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00002aaaab115000)
>>
>> the run command
>>
>>> mpirun -np 24 pmemd.MPI -O -i min.in -o min.out -p
>>> ../hep1_system_ETA_ETA_1.prmtop -c ../hep1_system_ETA_ETA_1.prmcrd -r
>>> min.rst -ref ../hep1_system_ETA_ETA_1.prmcrd
>>
>> starts execution, and eventually produces the min.out file. However all
>> the 24 processes
>> are executed on ONE NODE instead of distributing 12 processes on each of
>> the TWO nodes.
>>> qstat -1nu fglaser
>> Req'd Req'd
>> Elap
>> Job ID Username Queue Jobname SessID NDS TSK Memory Time
>> S Time
>> --------------- -------- -------- ---------- ------ --- --- ------ -----
>> - -----
>> 1076133.tamnun fglaser amir_q ETA_1_minz 6320 2 24 -- 168:0
>> R 00:10 n101/0*12+n102/0*12
>>
>>> hostname
>> n101
>>
>>> top
>> top - 17:42:18 up 13 days, 2:51, 1 user, load average: 9.52, 2.31, 0.77
>> Tasks: 378 total, 25 running, 353 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 65.8%us, 34.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
>> 0.0%st
>> Mem: 99063900k total, 3033516k used, 96030384k free, 221696k buffers
>> Swap: 8191992k total, 0k used, 8191992k free, 864036k cached
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 6452 fglaser 20 0 229m 57m 19m R 59.1 0.1 0:18.88 pmemd.MPI
>> 6451 fglaser 20 0 230m 58m 20m R 58.4 0.1 0:17.26 pmemd.MPI
>> 6445 fglaser 20 0 230m 58m 21m R 56.4 0.1 0:18.26 pmemd.MPI
>> 6461 fglaser 20 0 230m 56m 18m R 56.4 0.1 0:18.21 pmemd.MPI
>> 6449 fglaser 20 0 230m 57m 20m R 56.1 0.1 0:17.58 pmemd.MPI
>> 6459 fglaser 20 0 230m 56m 19m R 56.1 0.1 0:18.00 pmemd.MPI
>> 6460 fglaser 20 0 230m 55m 19m R 54.8 0.1 0:17.30 pmemd.MPI
>> 6453 fglaser 20 0 229m 57m 20m R 52.5 0.1 0:17.63 pmemd.MPI
>> 6444 fglaser 20 0 240m 70m 25m R 50.1 0.1 0:17.49 pmemd.MPI
>> 6448 fglaser 20 0 230m 58m 20m R 50.1 0.1 0:16.04 pmemd.MPI
>> 6462 fglaser 20 0 230m 55m 18m R 50.1 0.1 0:14.81 pmemd.MPI
>> 6455 fglaser 20 0 230m 56m 19m R 49.8 0.1 0:15.50 pmemd.MPI
>> 6457 fglaser 20 0 231m 55m 19m R 49.8 0.1 0:16.08 pmemd.MPI
>> 6446 fglaser 20 0 230m 58m 21m R 49.5 0.1 0:16.58 pmemd.MPI
>> 6464 fglaser 20 0 230m 56m 19m R 49.5 0.1 0:17.54 pmemd.MPI
>> 6450 fglaser 20 0 230m 57m 20m R 48.1 0.1 0:14.53 pmemd.MPI
>> 6447 fglaser 20 0 230m 58m 21m R 47.8 0.1 0:15.20 pmemd.MPI
>> 6458 fglaser 20 0 230m 56m 19m R 47.1 0.1 0:14.37 pmemd.MPI
>> 6454 fglaser 20 0 230m 56m 19m R 46.5 0.1 0:13.97 pmemd.MPI
>> 6466 fglaser 20 0 230m 56m 19m R 45.8 0.1 0:14.74 pmemd.MPI
>> 6456 fglaser 20 0 230m 56m 19m R 41.8 0.1 0:14.55 pmemd.MPI
>> 6467 fglaser 20 0 230m 57m 20m R 41.8 0.1 0:14.36 pmemd.MPI
>> 6463 fglaser 20 0 230m 55m 18m R 40.2 0.1 0:14.40 pmemd.MPI
>> 6465 fglaser 20 0 230m 56m 19m R 39.8 0.1 0:14.84 pmemd.MPI
>> 55 root 20 0 0 0 0 S 0.3 0.0 0:20.28 events/4
>> 6570 fglaser 20 0 13396 1408 896 R 0.3 0.0 0:00.03 top
>> 1 root 20 0 23588 1660 1312 S 0.0 0.0 0:02.66 init
>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
>> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
>> ..........................................................................
>> ..
>>> hostname
>> n102
>>> top
>> top - 17:46:09 up 13 days, 2:55, 1 user, load average: 0.07, 0.03, 0.00
>> Tasks: 349 total, 1 running, 348 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si,
>> 0.0%st
>> Mem: 99063900k total, 1962460k used, 97101440k free, 220228k buffers
>> Swap: 8191992k total, 0k used, 8191992k free, 746280k cached
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 182 root 39 19 0 0 0 S 0.3 0.0 47:28.09 kipmi0
>> 1 root 20 0 23592 1660 1312 S 0.0 0.0 0:02.64 init
>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
>> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.04 migration/0
>> 4 root 20 0 0 0 0 S 0.0 0.0 0:00.36 ksoftirqd/0
>> 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
>> 6 root RT 0 0 0 0 S 0.0 0.0 0:00.80 watchdog/0
>> 7 root RT 0 0 0 0 S 0.0 0.0 0:00.75 migration/1
>> 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
>> ..........................................................................
>> ....
>>
>> As you understand, the parallel execution in this way is not effective.
>> I suggest to send this output to the AMBER support/forum and ask for their
>> recommendations.
>>
>> To save time, I should mention that we have encountered apparently
>> similar problem with
>> a couple of other applications. At that time the problem was solved by
>> recompiling
>> and running with Intel MPI. Can AMBER 14 work with Intel MPI generally?
>> Meanwhile the attempt of compilation with Intel MPI (Intel version
>> 14.0.2) has failed.
>>
>> Any recommendations would be deeply appreciated.
>>
>> Regards,
>> Yulia Halupovich,
>> Technion - CIS, TAMNUN Team
>> phone: 972-4-8292654, fax: 972-4-8236212
>> Reply-to: hpc.technion.ac.il
>>
>>
>>
>>
>>
>>
>> _______________________________
>> Fabian Glaser, PhD
>>
>> Technion - Israel Institute of Technology
>> Haifa 32000, ISRAEL
>>
>> fglaser.technion.ac.il
>> Tel: +972 4 8293701
>> Fax: +972 4 8225153
>>
>>> On Dec 21, 2014, at 5:39 PM, Jason Swails <jason.swails.gmail.com>
>>> wrote:
>>>
>>> On Sun, Dec 21, 2014 at 9:02 AM, Fabian Glaser <fabian.glaser.gmail.com>
>>> wrote:
>>>
>>>> Hi Amber experts,
>>>>
>>>>
>>>> We had pmemd.MPI (amber 14) installed correctly and running, but after
>>>> a
>>>> disk addition to our cluster it fails to run, we try to run pmemd.MPI
>>>> with
>>>> the following setup
>>>>
>>>>> source /usr/local/amber14/setup.csh
>>>> which contains the following definitions
>>>>
>>>>> more /usr/local/amber14/setup.csh
>>>> #!/bin/csh -f
>>>> #
>>>> # Setup for Amber 14
>>>> #
>>>> setenv AMBERHOME /usr/local/amber14
>>>> setenv PATH $AMBERHOME/bin:$PATH
>>>> setenv LD_LIBRARY_PATH $AMBERHOME/lib:$LD_LIBRARY_PATH
>>>>
>>>> and sets the MPI path as follows:
>>>>
>>>>> which mpirun
>>>> /usr/local/amber14/bin/mpirun
>>>>
>>>> we get the following error message:
>>>>
>>>>> mpirun -np 12 pmemd.MPI
>>>> -------------------------------------------------------
>>>> Primary job terminated normally, but 1 process returned
>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>> -------------------------------------------------------
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> MPI version of PMEMD must be used with 2 or more processors!
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>>
>>>> it seems that the "numtasks" parameter is not passed to the pmemd.MPI
>>>> executable.
>>>>
>>>> We would quite appreciate any help, and are ready to provide any
>>>> additional
>>>> information regarding the AMBER 14 installation and compilation on our
>>>> system.
>>>>
>>>> All the other non-MPI programs including pmemd or sander run fine.
>>>
>>> ​This happens when you compile Amber with one MPI and try to use the
>>> "mpirun" from a different MPI. This is true of every MPI program, not
>>> just
>>> Amber.
>>>
>>> You need to make sure that Amber is compiled with the same MPI
>>> installation
>>> you intend on using to run it.
>>>
>>> ​HTH,
>>> Jason
>>> ​
>>> --
>>> Jason M. Swails
>>> BioMaPS,
>>> Rutgers University
>>> Postdoctoral Researcher
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 28 2014 - 11:00:02 PST
Custom Search