Re: [AMBER] CUDA and restraint

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 29 Apr 2016 08:28:42 -0700

Hi Domenico,

This sounds like a bug in the code to me. Can you put together the necessary files to repro this please. Prmtop, inpcrd, mdin, the command line you are using and a copy of your GPU mdout file and send it directly to me and I'll see what we can do to figure it out.

Thanks,

All the best
Ross



> On Apr 29, 2016, at 06:37, Domenico Marson <domenico87.gmail.com> wrote:
>
> The simulation runs fine without the restraints, on the GPU for at least 28
> ns without any problem!
>
> I updated the Nvidia drivers to version 361.42, ran a "yum update",
> recompiled AMBER and the problem is still here!
>
>
> On Fri, Apr 29, 2016 at 1:52 PM, Charles Lin <clin92.ucsd.edu> wrote:
>
>> Can you check if your simulation exhibits proper behavior without the
>> restraints for GPU?
>>
>> -Charlie
>> ________________________________________
>> From: Domenico Marson [domenico87.gmail.com]
>> Sent: Friday, April 29, 2016 2:35 AM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] CUDA and restraint
>>
>> Hello Bill, thank for your answer!
>>
>> I tried on the 2 different K20c I've got available, the simulation stops at
>> different timesteps (ntpr=1):
>> GPU1: step 43
>> GPU2: step 42
>> GPU2.rerun: 47
>>
>> The drivers and cuda I have on the machine are:
>> nvidia driver version: 352.79
>> cuda version 7.5.17
>>
>> Other simulations run fine on the GPU, either with AMBER and LAMMPS!
>>
>> Regards,
>> Domenico
>>
>> On Fri, Apr 29, 2016 at 10:28 AM, Bill Ross <ross.cgl.ucsf.edu> wrote:
>>
>>> Also it would be interesting to run on other GPUs of same or different
>>> model.
>>>
>>> The things that come to mind are a compiler bug, a GPU firmware bug, or
>>> a flaky GPU.
>>>
>>> Bill
>>>
>>> On 4/29/16 1:25 AM, Bill Ross wrote:
>>>> Does it blow up at the same place each time on the GPU? E.g. tail the
>>>> .out files.
>>>>
>>>> Bill
>>>>
>>>>
>>>> On 4/29/16 1:19 AM, Domenico Marson wrote:
>>>>> Thank you Charles for your answer!
>>>>> Unfortunately, as I stated in my previous message, running in the CPU
>>> for 7
>>>>> ns the system seems to behave quite fine, there aren't noticeable
>>> anomalies
>>>>> in the energies or physical variables.
>>>>> I can't understand why only on the GPU is the system explodes.
>>>>>
>>>>> Regards,
>>>>> Domenico
>>>>>
>>>>> On Thu, Apr 28, 2016 at 6:27 PM, Charles Lin <clin92.ucsd.edu> wrote:
>>>>>
>>>>>> If the CPU simulations are also getting more unfavorable energies
>> with
>>>>>> each step there may still be something undesirable in your simulation
>>> (box
>>>>>> size, overlap in LJ radius, etc) causing it to explode. My guess
>>> (which
>>>>>> could also be wrong) would be to weaken the restraints you're using.
>>> It
>>>>>> may be that the your molecule strongly prefers being in a distances
>>> <r1 or
>>>>>>> r4 which that portion of the curve follows a linear path so the more
>>> it
>>>>>> deviates out the higher the force and energy contributions will be.
>> A
>>> much
>>>>>> higher force constant can likely cascade into a quick acceleration
>> of
>>> your
>>>>>> system. Preferably for restraints you want the distance to stay
>>> within r1
>>>>>> and r4.
>>>>>>
>>>>>> The error itself is likely saying that something in your simulation
>> is
>>>>>> exploding and its having issues downloading the data off GPU memory
>> to
>>> CPU
>>>>>> memory.
>>>>>>
>>>>>> Charlie
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Domenico Marson [domenico87.gmail.com]
>>>>>> Sent: Thursday, April 28, 2016 3:31 AM
>>>>>> To: amber.ambermd.org
>>>>>> Subject: [AMBER] CUDA and restraint
>>>>>>
>>>>>> Hello everyone, I'm copying a message I sent one week ago without
>>>>>> receiving any answer, I know it's a busy time of the year, with the
>>>>>> upcoming release, probably it went unnoticed!
>>>>>>
>>>>>> In the meantime I've been running 7 ns of simulation on CPU and the
>>>>>> trajectory seems fine both in the data and visualizing it!
>>>>>>
>>>>>> Copied text:
>>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I'm sorry to bother you just before the release of the "nextgen"
>>>>>> Amber, but I have some trouble with pmemd.cuda.
>>>>>>
>>>>>> I'm trying to run a system with cartesian restraint applied to all
>> the
>>>>>> atoms of a nanoparticle in explicit TIP3P water,
>>>>>> while with nmropt I'm restraining the distance of each of 56
>> different
>>>>>> atoms on the surface of this nanoparticle.
>>>>>> To achieve this I restrain the distance between the COM of the
>>>>>> nanoparticle and each atoms on its surface.
>>>>>> So, to restrain the distances I have a restraint file with, for N in
>>>>>> [1, 56] and dist1-4 varying:
>>>>>> &rst
>>>>>> iat= -1, N,
>>>>>> r1=dist1 r2=dist2, r3=dist3, r4=dist4,
>>>>>> rk2=10.00, rk3=10.00,
>>>>>> ialtd=0,
>>>>>>
>>>>>>
>>>
>> igr1=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,
>>>>>> /
>>>>>>
>>>>>> I have amber patched (compiled) with all the patches available as
>>>>>> today, and my GPU is a Tesla K20c with Driver Version 352.79 and cuda
>>>>>> 7.5.
>>>>>>
>>>>>> I performed minimisation, heating to 300 K (20 ps) and a fist
>>>>>> equilibration of density (50 ps) on the CPU without any problem.
>>>>>> Energy are fine and also the trajectory/behaviour seems fine.
>>>>>> Than I wanted to continue on the GPU but, no matter what combination
>>>>>> of ntc=2 + ntf=1 or ntc=2 + ntf=2, my simulation blows up just in a
>>>>>> few steps with error:
>>>>>> "cudaMemcpy GpuBuffer::Download failed an illegal memory access was
>>>>>> encountered".
>>>>>> I tried to output every frames in the mdout and trajectory, but I
>>>>>> can't see the reason why It's blowing up.
>>>>>> I see only my nanoparticle "exploding" step by step, and the
>> restraint
>>>>>> and bond energies increasing to reach "******" in the first 5-10
>>>>>> steps.
>>>>>> Moreover, If I continue with the same settings on the CPU no problem
>>>>>> arises (at least not in a reasonable time, I have only 6 cores
>>>>>> available).
>>>>>>
>>>>>> I know many patches were released for COM restraint on GPU, maybe
>>>>>> something else is missing? Or I'm just trying too much?
>>>>>> Thank you all for your help!
>>>>>>
>>>>>> Regards,
>>>>>> Domenico
>>>>>>
>>>>>> --
>>>>>> Domenico Marson, Ph.D.
>>>>>> Department of Engineering and Architecture (DEA) Postdoctoral Fellow
>>>>>> Molecular Simulation Engineering (MOSE) Laboratory
>>>>>>
>>>>>> University of Trieste
>>>>>>
>>>>>> Skype: domenicomars
>>>>>>
>>>>>> _______________________________________________
>>>>>> AMBER mailing list
>>>>>> AMBER.ambermd.org
>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>
>>>>>> _______________________________________________
>>>>>> AMBER mailing list
>>>>>> AMBER.ambermd.org
>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> *Domenico Marson, Ph.D.*
>> Department of Engineering and Architecture (DEA) Postdoctoral Fellow
>> Molecular Simulation Engineering (MOSE) Laboratory
>>
>> University of Trieste
>>
>> Skype: domenicomars
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> *Domenico Marson, Ph.D.*
> Department of Engineering and Architecture (DEA) Postdoctoral Fellow
> Molecular Simulation Engineering (MOSE) Laboratory
>
> University of Trieste
>
> Skype: domenicomars
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 29 2016 - 08:30:08 PDT
Custom Search