Re: [AMBER] amber 15 compiled with multiple GPUs

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 19 Jun 2015 09:06:55 -0700

Hi Nam,

I suggest reading the following page in it's entirety.

http://ambermd.org/gpus/

This includes details about running on multiple GPUs. Something to note is that interconnect speeds are no longer sufficient to keep up with AMBER running on GPUs. As such you need to have peer to peer access between GPUs in order to see scaling. For most hardware this limits you to 2 GPUs per node - when both GPUs are on the same CPU socket. Running across multiple nodes is only supported for loosely coupled simulations such as replica exchange.

In terms of what you show below - you are attempting to run on 4 GPUs on a single node. This is fine for GB simulations (of > 2500 atoms or so) but for PME simulations it will only show speedup if you have hardware (such as CirraScales P2P systems) that support non-blocking peer to peer PCI-E communication between all 4 GPUs. I doubt you have such hardware or I would be aware of it so I suspect you have standard dual socket nodes in which case only pairs of GPUs can communicate via P2P as described in the webpage I link to above. You will need to request 2 tasks and 2 GPUs per node and have your admins configure things such that only GPUs on the same processor socket are allocated to a job.

In terms of the error message you see this looks to be outside of AMBER. I suspect something is wrong with the way you are running MPI jobs in general.

I don't think just doing 'srun $AMBERHOME/bin/pmemd.cuda.MPI' is correct. Likely you need something like 'mpirun -np 4 $AMBERHOME/bin/pmemd.cuda.MPI' - check with your sys admin for the cluster what the correct syntax is and make sure you can run regular MPI runs with CPUs first before trying the complexity of GPUs in parallel.

All the best
Ross

> On Jun 19, 2015, at 8:47 AM, nam kim <namkkim.gmail.com> wrote:
>
> Hi, I can run Amber with multiple node with openmpi(1.6.5), with a single
> GPU
> (It seems Amber is running with multiple GPUs on single node, but I can't
> monitor the processes)
>
> When I try to run Amber with multiple nodes with multiple GPUs, it crashes.
>
> -Nam
>
> Here is a part of error log
>
> [node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> rml_oob_send.c at line 145
> [node006:13891] [[1902,1],1] attempted to send to [[1902,1],0]: tag 15
> [node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> grpcomm_hier_module.c at line 355
> [node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> base/grpcomm_base_modex.c at line 469
> [node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> grpcomm_hier_module.c at line 476
> [node006:13893] [[1902,1],3] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> rml_oob_send.c at line 145
> [node006:13893] [[1902,1],3] attempted to send to [[1902,1],0]: tag 15
> [node006:13893] [[1902,1],3] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process
>
> Here is batch script
>
> #!/bin/bash
> #SBATCH -D /home/dwych/testruns/PTEN_Calpain1/COM_Sim/
> #SBATCH -J PTEN_Cal
> #SBATCH --partition=all
> #SBATCH --get-user-env
> #SBATCH --nodes=1
> #SBATCH --tasks-per-node=4
> #SBATCH --gres=gpu:4
> #SBATCH --time=24:00:00
> #SBATCH --share
>
> #source /etc/profile.d/modules.sh
> export AMBERHOME=/home/namkim/amber14
> export CUDAHOME=/cm/shared/apps/cuda70/toolkit/current
> export PATH=$CUDAHOME/bin:/cm/shared/apps/openmpi/gcc/64/current/bin:$PATH
> export
> LD_LIBRARY_PATH=$CUDAHOME/lib64:/cm/shared/apps/openmpi/gcc/64/current/lib64:$LD_LIBRARY_PATH
> test -f /home/namkim/amber14/amber.sh && source
> /home/namkim/amber14/amber.sh
>
>
> srun /home/namkim/amber14/bin/pmemd.cuda.MPI -O -i PTEN_Calp_prod.in -o
> PTEN_Calp_prod.out -p PTEN_Calp.prmtop -c PTEN_Calp_heat.rst -r
> PTEN_Calp_prod.rst -x PTEN_Calp_prod.mdcrd
>
>
> On Fri, Jun 19, 2015 at 7:55 AM, Kenneth Huang <kennethneltharion.gmail.com>
> wrote:
>
>> Hi,
>>
>> Was Amber compiled correctly? What is the error message that it's giving
>> out when it crashes, does it work in serial (CPU and GPU), and does it work
>> with just multiple CPUs?
>>
>> Best,
>>
>> Kenneth
>>
>> On Friday, June 19, 2015, nam kim <namkkim.gmail.com> wrote:
>>
>>> Well, I've been told from Tech support that amber is not running with
>>> Multiple GPUs + multiple nodes.
>>> Is it right?
>>>
>>> On Fri, Jun 19, 2015 at 12:10 AM, Nhai <nhai.qn.gmail.com
>> <javascript:;>>
>>> wrote:
>>>
>>>> I don't think you gave enough info to debug.
>>>>
>>>> Cheers
>>>>
>>>> Hai
>>>>
>>>>> On Jun 19, 2015, at 1:43 AM, nam kim <namkkim.gmail.com
>> <javascript:;>>
>>> wrote:
>>>>>
>>>>> Hi, I compiled it with MPI+GPU options.
>>>>> When I run my amber script, it crashed.
>>>>> Anyone able to run amber job with multiple nodes + multiple GPUs?
>>>>>
>>>>> Thanks
>>>>> -Nam
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org <javascript:;>
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org <javascript:;>
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org <javascript:;>
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>> --
>> Ask yourselves, all of you, what power would hell have if those imprisoned
>> here could not dream of heaven?
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jun 19 2015 - 09:30:02 PDT
Custom Search