Re: [AMBER] amber16 on parallel GPUs from Ross Walker on 2017-01-30 (Amber Archive Jan 2017)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 30 Jan 2017 19:30:33 -0500

Hi Hiradesh,

Yes you can ignore that. TRPCage is a very small test case and so maxes out at 30 CPU cores or a single GPU.

All the best
Ross

> On Jan 30, 2017, at 11:10, Hirdesh Kumar <hirdesh.iitd.gmail.com> wrote:
>
> Thanks Ross,
> Very helpful. I performed the AMBER benchmark check on my system. Result
> looks promising, I just have one query:
> In "TRPCAGE_PRODUCTION - 304 atoms GB" test,
>
> I got following error>: CPU code 40 cores: | ERROR: Must have 10x more
> atoms than processors!
>
> But I guess I can ignore this error ??
>
> Thanks,
> Hirdesh
>
> **
>
> On Fri, Jan 27, 2017 at 7:04 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi Hirdesh,
>>
>> You are unlikely to see good scaling across 4 1080's unless you have
>> custom peer to peer hardware. Take a read through the following pages:
>>
>> http://ambermd.org/gpus/ <http://ambermd.org/gpus/>
>>
>> http://ambermd.org/gpus/recommended_hardware.htm#hardware <
>> http://ambermd.org/gpus/recommended_hardware.htm#hardware>
>>
>> and
>>
>> http://ambermd.org/gpus/benchmarks.htm#Benchmarks <
>> http://ambermd.org/gpus/benchmarks.htm#Benchmarks>
>>
>> If you use GPU's 0 and 1 together or 2 and 3 together you will probably
>> see speedup if they have peer to peer connectivity. Likely your best
>> approach will be 4 x 1 GPU runs or 2 x 2 GPU runs (or 2 x 1 + 1 x 2). Note
>> from the benchmark page you can download the benchmark suite and run it on
>> you system and you'll be able to see how each GPU is performing and how
>> well 2 and 4 GPU combinations work.
>>
>> All the best
>> Ross
>>
>>> On Jan 27, 2017, at 16:06, Hirdesh Kumar <hirdesh.iitd.gmail.com> wrote:
>>>
>>> Hi,
>>> I am testing my recently installed amber16 which I built as
>> pmemd.cuda.MPI
>>>
>>> the intallation was successful (no FAILURE in make test.cuda_parallel).
>>>
>>> Next I submitted a test job as:
>>>
>>>
>>> *export CUDA_VISIBLE_DEVICE=0,1,2,3mpirun -np 4 pmemd.cuda_SPFP.MPI -O
>> -i
>>> prod1.in <http://prod1.in> -p protein.prmtop -c eq3.rst -r prod1.rst -o
>>> prod1.out -x prod1.nc <http://prod1.nc>*
>>>
>>>
>>> Using nvidia-smi, I checked that I can access all 4 GPUs in parallel,
>>> however, I am surprised by GPU-utility. Why each GPU is used only ~50%.
>>>
>>> I previosly had speed of 44 ns/day using a single K80 GPU. However, using
>>> these 4 parallel GPUs, I only get speed of 66 ns/day. What is wrong ?
>>>
>>> Below is the output of "nvidia-smi"
>>>
>>> +-----------------------------------------------------------
>> ------------------+
>>> | NVIDIA-SMI 367.57 Driver Version:
>>> 367.57 |
>>> |-------------------------------+----------------------+----
>> ------------------+
>>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
>>> ECC |
>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
>> Compute
>>> M. |
>>> |===============================+======================+====
>> ==================|
>>> | 0 GeForce GTX 1080 Off | 0000:02:00.0 On |
>>> N/A |
>>> | 52% 72C P2 80W / 180W | 744MiB / 8111MiB | 51%
>>> Default |
>>> +-------------------------------+----------------------+----
>> ------------------+
>>> | 1 GeForce GTX 1080 Off | 0000:03:00.0 Off |
>>> N/A |
>>> | 51% 74C P2 72W / 180W | 525MiB / 8113MiB | 42%
>>> Default |
>>> +-------------------------------+----------------------+----
>> ------------------+
>>> | 2 GeForce GTX 1080 Off | 0000:82:00.0 Off |
>>> N/A |
>>> | 47% 71C P2 71W / 180W | 525MiB / 8113MiB | 44%
>>> Default |
>>> +-------------------------------+----------------------+----
>> ------------------+
>>> | 3 GeForce GTX 1080 Off | 0000:83:00.0 Off |
>>> N/A |
>>> | 51% 73C P2 73W / 180W | 525MiB / 8113MiB | 43%
>>> Default |
>>> +-------------------------------+----------------------+----
>> ------------------+
>>>
>>>
>>> +-----------------------------------------------------------
>> ------------------+
>>> | Processes: GPU
>>> Memory |
>>> | GPU PID Type Process name
>>> Usage |
>>> |===========================================================
>> ==================|
>>> | 0 3020 G /usr/lib/xorg/Xorg
>>> 168MiB |
>>> | 0 3566 G compiz
>>> 73MiB |
>>> | 0 10231 C pmemd.cuda_SPFP.MPI
>>> 387MiB |
>>> | 0 10232 C pmemd.cuda_SPFP.MPI
>>> 111MiB |
>>> | 1 10231 C pmemd.cuda_SPFP.MPI
>>> 111MiB |
>>> | 1 10232 C pmemd.cuda_SPFP.MPI
>>> 411MiB |
>>> | 2 10233 C pmemd.cuda_SPFP.MPI
>>> 411MiB |
>>> | 2 10234 C pmemd.cuda_SPFP.MPI
>>> 111MiB |
>>> | 3 10233 C pmemd.cuda_SPFP.MPI
>>> 111MiB |
>>> | 3 10234 C pmemd.cuda_SPFP.MPI
>>> 411MiB |
>>> +-----------------------------------------------------------
>> ------------------+
>>>
>>>
>>>
>>> Thanks,
>>> Hirdesh
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jan 30 2017 - 17:00:03 PST