Also it would be interesting to run on other GPUs of same or different
model.
The things that come to mind are a compiler bug, a GPU firmware bug, or
a flaky GPU.
Bill
On 4/29/16 1:25 AM, Bill Ross wrote:
> Does it blow up at the same place each time on the GPU? E.g. tail the
> .out files.
>
> Bill
>
>
> On 4/29/16 1:19 AM, Domenico Marson wrote:
>> Thank you Charles for your answer!
>> Unfortunately, as I stated in my previous message, running in the CPU for 7
>> ns the system seems to behave quite fine, there aren't noticeable anomalies
>> in the energies or physical variables.
>> I can't understand why only on the GPU is the system explodes.
>>
>> Regards,
>> Domenico
>>
>> On Thu, Apr 28, 2016 at 6:27 PM, Charles Lin <clin92.ucsd.edu> wrote:
>>
>>> If the CPU simulations are also getting more unfavorable energies with
>>> each step there may still be something undesirable in your simulation (box
>>> size, overlap in LJ radius, etc) causing it to explode. My guess (which
>>> could also be wrong) would be to weaken the restraints you're using. It
>>> may be that the your molecule strongly prefers being in a distances <r1 or
>>>> r4 which that portion of the curve follows a linear path so the more it
>>> deviates out the higher the force and energy contributions will be. A much
>>> higher force constant can likely cascade into a quick acceleration of your
>>> system. Preferably for restraints you want the distance to stay within r1
>>> and r4.
>>>
>>> The error itself is likely saying that something in your simulation is
>>> exploding and its having issues downloading the data off GPU memory to CPU
>>> memory.
>>>
>>> Charlie
>>>
>>> ________________________________________
>>> From: Domenico Marson [domenico87.gmail.com]
>>> Sent: Thursday, April 28, 2016 3:31 AM
>>> To: amber.ambermd.org
>>> Subject: [AMBER] CUDA and restraint
>>>
>>> Hello everyone, I'm copying a message I sent one week ago without
>>> receiving any answer, I know it's a busy time of the year, with the
>>> upcoming release, probably it went unnoticed!
>>>
>>> In the meantime I've been running 7 ns of simulation on CPU and the
>>> trajectory seems fine both in the data and visualizing it!
>>>
>>> Copied text:
>>>
>>> Hello everyone,
>>>
>>> I'm sorry to bother you just before the release of the "nextgen"
>>> Amber, but I have some trouble with pmemd.cuda.
>>>
>>> I'm trying to run a system with cartesian restraint applied to all the
>>> atoms of a nanoparticle in explicit TIP3P water,
>>> while with nmropt I'm restraining the distance of each of 56 different
>>> atoms on the surface of this nanoparticle.
>>> To achieve this I restrain the distance between the COM of the
>>> nanoparticle and each atoms on its surface.
>>> So, to restrain the distances I have a restraint file with, for N in
>>> [1, 56] and dist1-4 varying:
>>> &rst
>>> iat= -1, N,
>>> r1=dist1 r2=dist2, r3=dist3, r4=dist4,
>>> rk2=10.00, rk3=10.00,
>>> ialtd=0,
>>>
>>> igr1=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,
>>> /
>>>
>>> I have amber patched (compiled) with all the patches available as
>>> today, and my GPU is a Tesla K20c with Driver Version 352.79 and cuda
>>> 7.5.
>>>
>>> I performed minimisation, heating to 300 K (20 ps) and a fist
>>> equilibration of density (50 ps) on the CPU without any problem.
>>> Energy are fine and also the trajectory/behaviour seems fine.
>>> Than I wanted to continue on the GPU but, no matter what combination
>>> of ntc=2 + ntf=1 or ntc=2 + ntf=2, my simulation blows up just in a
>>> few steps with error:
>>> "cudaMemcpy GpuBuffer::Download failed an illegal memory access was
>>> encountered".
>>> I tried to output every frames in the mdout and trajectory, but I
>>> can't see the reason why It's blowing up.
>>> I see only my nanoparticle "exploding" step by step, and the restraint
>>> and bond energies increasing to reach "******" in the first 5-10
>>> steps.
>>> Moreover, If I continue with the same settings on the CPU no problem
>>> arises (at least not in a reasonable time, I have only 6 cores
>>> available).
>>>
>>> I know many patches were released for COM restraint on GPU, maybe
>>> something else is missing? Or I'm just trying too much?
>>> Thank you all for your help!
>>>
>>> Regards,
>>> Domenico
>>>
>>> --
>>> Domenico Marson, Ph.D.
>>> Department of Engineering and Architecture (DEA) Postdoctoral Fellow
>>> Molecular Simulation Engineering (MOSE) Laboratory
>>>
>>> University of Trieste
>>>
>>> Skype: domenicomars
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 29 2016 - 01:30:12 PDT