Re: [AMBER] GPGPU AMBER11 question from Carlos Sosa on 2011-08-26 (Amber Archive Aug 2011)

From: Carlos Sosa <sosa0006.r.umn.edu>
Date: Fri, 26 Aug 2011 10:21:28 -0500

Hi Ross,

Since it is fabric dependent this is the info. that I gathered to support
GPUDirect The machine that I am testing has Mellanox ConnectX-2 Infiniband
adapter. I have not tried this yet.

GPUDirect page from nVidia: http://developer.nvidia.com/gpudirect

What you have to do:
Download the nvidia patch for RHEL 5.5 kernel:
http://developer.download.nvidia.com/compute/cuda/3_2/GPUDirect/nvidia-gpudirect-3.2-1.tar.gz
In this patch, you will find the RPM files to install.
Contact Mellanox to get the driver for your ConnectX-2 Infiniband adapter:
hpc.mellanox.com
Install the Mellanox driver
Use the mpi_pinned.c example provided on the nvidia-gpudirect-3.2-1.tar.gz
to check that all is fine.

On Thu, Aug 25, 2011 at 10:19 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Carlos,
>
> This is interesting. Thanks for figuring this out. I believe you are the
> first person to test the GPU implementation with Intel MPI. We have only
> tested with MPICH2 and MVAPICH2 previously. I'd be interested to know how
> well it performs.
>
> Do you know if Intel MPI supports GPU Direct?
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Carlos P Sosa [mailto:cpsosa.msi.umn.edu]
> > Sent: Thursday, August 25, 2011 12:28 PM
> > To: amber.ambermd.org
> > Subject: [AMBER] GPGPU AMBER11 question
> >
> >
> > Potential solution:
> >
> > The issue appears to be between Intel MPI and CUDA 3.2
> >
> > The following env. variable solves the crash:
> >
> > I_MPI_FABRICS=shm:ofa
> >
> > carlos p sosa
> >
> >
> > Hello,
> >
> > I just built PMEMD for GPGPUs according to (http://ambermd.org/gpus/),
> > I
> > used the Intel MPI version (intel/impi/4.0.1.007). Then I tested it
> > with
> > the standard jac benchmark without vlimit
> >
> > short md, jac, power 2 FFT
> > &cntrl
> > ntx=7, irest=1,
> > ntc=2, ntf=2, tol=0.0000001,
> > nstlim=1000,
> > ntpr=5, ntwr=10,
> > dt=0.001,
> > cut=9.,
> > ntt=0, temp0=300.,
> > /
> > &ewald
> > nfft1=64,nfft2=64,nfft3=64,
> > /
> >
> > Has anybody seen this problem? The build ends successfully. I am
> > using
> > PBS with 2 nodes. Did I forget any patches?
> >
> > [0:node037] rtc_register failed 196608 [0] error(0x30000): unknown
> > error
> >
> > Assertion failed in file ../../dapl_module_send.c at line 4711: 0
> > internal ABORT - process 0
> > rank 0 in job 1 node037_43404 caused collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Carlos P Sosa.
*Biomedical Informatics and Computational Biology* (BICB) Consultant
University of Minnesota
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Aug 26 2011 - 08:30:04 PDT