Hi Ross,
thanks for your reply
I understand what you say but only partially...
How come that if I slow down the cpu clock by a factor 2 the speed of 1 simulation does not change compared to normal, however if I return the CPU speed to the max and then I constrain a core to serve 2 simulations, then the speed of each simulation halvens? The number of cpu cycles dedicated to Amber in the two cases is the same...
Also to reply you on another topic:
You say that the CPU is used to launch the kernels... so Amber is not using K20's Dynamic Parallelism yet?
Intuitively speaking such feature might provide a tremendous speed boost
( All this I say is not to not-appreciate the Amber team's very excellent work )
Thank you
J
-----Original Message-----
From: Ross Walker <ross.rosswalker.co.uk>
To: AMBER Mailing List <amber.ambermd.org>
Sent: Tue, Aug 20, 2013 10:10 pm
Subject: Re: [AMBER] Cpu busy looping?
Hi Jake,
The CPU is polling the GPU for kernel completions to:
1) Launch the next kernel in the sequence. - since kernel launches are
controlled by the CPU.
2) Download / upload memory as needed to perform I/O.
Unfortunately there is no free lunch - until there is a full operating
system on a GPU - such that you can plug the disk directly into the GPU
and throw away the rest of the machine this is the way it has to run. Be
grateful that you are not using NAMD or Gromacs which would swallow ALL of
your cores and ALL of your GPUs for a single calculation in order to get
close to the speed AMBER gets using just 1 GPU and 1 CPU core - So 4 GPUs
+ 4 CPU cores in a single node gives you the equivalent of 4 full nodes
running 4 Gromacs simulations.
The CPU speed itself is irrelevant in that it just needs to be a minimum
speed - essentially enough to monitor interrupts at a fast enough
frequency. We've not tested the bottom end, certainly sharing a single
core for multiple GPUs slows things down but a single 1.8GHz core is
easily enough to keep up with 99.999% of GPU calculations - the only place
where it really would fall down is if you do a crazy amount of I/O, say
with ntwx=1.
Anyway, that's an aside. Note it is not the entire CPU being used 100% -
it is merely 1 core being used 100%. Ultimately the rule is:
You need 1 CPU core for each GPU in your system. These can be low end,
cost effective CPUs (the MAJOR advantage of AMBER over Gromacs and NAMD
which need expensive CPUs for performance) - this is what we tend to
recommend with the custom designed Exxact machines (see the AMBER
website). Thus a cheap 4 core CPU can easily handle running 4 GPUs flat
out. Ideally you want 6 cores to allow some free for the OS. For home
built I tend to recommend the 8 core AMD chips since they are very cheap
<$150 each and can easily handle 4 GPUs plus the operating system, I/O,
interactive use etc.
Attached are a couple of potential machine configs that work well.
Note the other thing you can do, with say a dual x 8 core machine is to
run 4 single GPU jobs, and then use the remaining 12 cores for a CPU MPI
run.
All the best
Ross
On 8/20/13 12:32 PM, "Jake Smith" <amberonejake.aol.com> wrote:
>
>Hello Amberers
>While doing serial simulations on GPU the CPU speed seems indeed
>irrelevant but then why is one CPU core always stuck at 100% busy when a
>GPU is performing a computation? This is not good in terms of how many
>GPUs can be driven by a low end CPU. Can I ask what is that core doing
>exactly? The thing is strange especially because the CPU speed seems
>irrelevant.
>Thank you
>J
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 20 2013 - 15:30:03 PDT