Re: [AMBER] amber 15 compiled with multiple GPUs

From: nam kim <namkkim.gmail.com>
Date: Fri, 19 Jun 2015 08:47:12 -0700

Hi, I can run Amber with multiple node with openmpi(1.6.5), with a single
GPU
(It seems Amber is running with multiple GPUs on single node, but I can't
monitor the processes)

When I try to run Amber with multiple nodes with multiple GPUs, it crashes.

-Nam

Here is a part of error log

[node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
rml_oob_send.c at line 145
[node006:13891] [[1902,1],1] attempted to send to [[1902,1],0]: tag 15
[node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
grpcomm_hier_module.c at line 355
[node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
base/grpcomm_base_modex.c at line 469
[node006:13891] [[1902,1],1] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
grpcomm_hier_module.c at line 476
[node006:13893] [[1902,1],3] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
rml_oob_send.c at line 145
[node006:13893] [[1902,1],3] attempted to send to [[1902,1],0]: tag 15
[node006:13893] [[1902,1],3] ORTE_ERROR_LOG: A message is attempting to be
sent to a process

Here is batch script

#!/bin/bash
#SBATCH -D /home/dwych/testruns/PTEN_Calpain1/COM_Sim/
#SBATCH -J PTEN_Cal
#SBATCH --partition=all
#SBATCH --get-user-env
#SBATCH --nodes=1
#SBATCH --tasks-per-node=4
#SBATCH --gres=gpu:4
#SBATCH --time=24:00:00
#SBATCH --share

#source /etc/profile.d/modules.sh
export AMBERHOME=/home/namkim/amber14
export CUDAHOME=/cm/shared/apps/cuda70/toolkit/current
export PATH=$CUDAHOME/bin:/cm/shared/apps/openmpi/gcc/64/current/bin:$PATH
export
LD_LIBRARY_PATH=$CUDAHOME/lib64:/cm/shared/apps/openmpi/gcc/64/current/lib64:$LD_LIBRARY_PATH
test -f /home/namkim/amber14/amber.sh && source
/home/namkim/amber14/amber.sh


srun /home/namkim/amber14/bin/pmemd.cuda.MPI -O -i PTEN_Calp_prod.in -o
PTEN_Calp_prod.out -p PTEN_Calp.prmtop -c PTEN_Calp_heat.rst -r
PTEN_Calp_prod.rst -x PTEN_Calp_prod.mdcrd


On Fri, Jun 19, 2015 at 7:55 AM, Kenneth Huang <kennethneltharion.gmail.com>
wrote:

> Hi,
>
> Was Amber compiled correctly? What is the error message that it's giving
> out when it crashes, does it work in serial (CPU and GPU), and does it work
> with just multiple CPUs?
>
> Best,
>
> Kenneth
>
> On Friday, June 19, 2015, nam kim <namkkim.gmail.com> wrote:
>
> > Well, I've been told from Tech support that amber is not running with
> > Multiple GPUs + multiple nodes.
> > Is it right?
> >
> > On Fri, Jun 19, 2015 at 12:10 AM, Nhai <nhai.qn.gmail.com
> <javascript:;>>
> > wrote:
> >
> > > I don't think you gave enough info to debug.
> > >
> > > Cheers
> > >
> > > Hai
> > >
> > > > On Jun 19, 2015, at 1:43 AM, nam kim <namkkim.gmail.com
> <javascript:;>>
> > wrote:
> > > >
> > > > Hi, I compiled it with MPI+GPU options.
> > > > When I run my amber script, it crashed.
> > > > Anyone able to run amber job with multiple nodes + multiple GPUs?
> > > >
> > > > Thanks
> > > > -Nam
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org <javascript:;>
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org <javascript:;>
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org <javascript:;>
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
> --
> Ask yourselves, all of you, what power would hell have if those imprisoned
> here could not dream of heaven?
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jun 19 2015 - 09:00:02 PDT
Custom Search