Dear Jason M. Swails:
Thanks very much!
I use AMBER12 and AMBERTOOLS12. The second GPU wokrs. I submit a 20ns job to the second GPU. It can run the job for short time (maybe stopped at 2ns or 3ns) without any error message. May it be the problem of the compiler?
I wonder whether there are some methods to submit 2 or more jobs to one GPU without evident speed decrease. One job just cost 500M of the total 5000M GPU memory. This may be a little waste of the computing resources.
Qiao Xue
xueqiaoup.gmail.com
From: Jason Swails
Date: 2014-09-29 20:15
To: amber
Subject: Re: [AMBER] some problems when using pmemd.cuda
On Mon, 2014-09-29 at 16:05 +0800, xueqiaoup.gmail.com wrote:
> Dear Amber users:
> Hi! When I am using pmemd.cuda on GPU to accelerate MD, I
> suffer some problems.
> First, My node has 2U GPU nvidia M2090, 32 cores CPU(2U), 64G
> memory. But I can only use the first one(CUDA_VISIBLE_DEVICE="0"). I
> can run MD simulations on the first GPU. When I set
> CUDA_VISIBLE_DEVICE="1", the same MD simulations stop without any
> error message. Both of the GPUs are using cuda-4.2.
There is not enough information to help diagnose what is happening. Are
you sure that the second GPU works? Have you run the Amber CUDA tests
on both GPUs? Do other CUDA programs (e.g., from the CUDA samples or
CUDA SDK) work? What version of Amber are you using? Have you updated
to the latest version?
> Secondly, the system contained 8W atoms runs 8 ns/day when
> using 1 GPU. It takes 538M of the GPU memory (5037M total.). Once I
> submit another job on this GPU, they just occupy 1000M GPU memory.
> However, both speeds decrease to 4 ns/day. The node still have enough
> GPU memory, CPU cores and memory. Why the speed of the job decrease so
> much?
Because you are splitting the processing power between two jobs. The
amount of memory has virtually no impact on performance as long as you
have enough to prevent swapping. The speed of the memory bus and the
speed of the processors have a lot more effect. Because pmemd.cuda is
well-optimized, it wastes very few clock cycles. As a result, if you
run 2 jobs on the same GPU, each job will get on average half of the
available clock cycles, meaning it can run half as fast.
HTH,
Jason
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Sep 29 2014 - 20:00:03 PDT