Re: [AMBER] GPU and ntpr from Fernando Martín García on 2012-11-30 (Amber Archive Nov 2012)

From: Fernando Martín García <fmgarcia.cbm.uam.es>
Date: Fri, 30 Nov 2012 10:16:38 +0100

Hi Ross,

Thank you for all detailed information.

Best regards,

Fernando

On Tue, 27 Nov 2012 10:48:13 -0800, Ross Walker wrote:
> Hi Fernando,
>
> I still believe that this is simply a statistical sampling issue. If
> you
> are looking at the averages pmemd by default only calculates the
> average
> every NTPR steps. So if you have a 100,000 step run with ntpr=1 you
> get
> the average over 100,000 values while if you have ntpr=1000 you get
> the
> average over only 1000 values. These two values will only converge in
> the
> limit of nstlim -> infinity. This is compounded by the fact that when
> you
> set ntpr=1 it calculates the energy on every step while when you have
> it
> set to 1000 it calculates the energy every 1000 steps. This involves
> a
> different number of mathematical operations so you will get different
> rounding leading to divergence of the trajectory. This will be more
> pronounced at low values of ntpr. I.e. for 1000 going to 10000 the
> 'extra'
> calculations are carried out rarely while going from ntpr=1 to
> ntpr=10 you
> change how often the 'extra' calculations by a lot so the rounding
> difference will be more pronounced.
>
> There are a number of things you can play with. If you want you can
> request that the averages be calculated every step irrespective of
> ntpr.
> This is done by using the keyword ene_avg_sampling in the cntrl name
> list.
> This is related to the following code in pme_force.fpp
>
> call update_time(pme_misc_timer)
> if ((mod(irespa, ene_avg_sampling) .eq. 0) .or. (mod(irespa, ntpr)
> .eq.
> 0)) then
> call gpu_pme_ene(ew_coeff, uc_volume, pot_ene, virial, ekcmt)
> else
> call gpu_pme_force(ew_coeff, uc_volume, virial, ekcmt)
> end if
> call update_time(nonbond_time)
>
> In addition to other code. This will cause the code to calculate the
> energies on every step and average them rather than every ntpr step.
> This
> will however slow down the calculation, although arguably less than
> ntpr=1
> since you don't have the extra i/o. It should also address issues
> with
> divergence of the trajectory since it will force calls to the full
> energy
> routine on every step irrespective of ntpr.
>
>
> This is somewhat artificial though, both trajectories are correct
> since
> Newton's equations of motion are, by definition, chaotic. Hence what
> you
> see is, I believe, simply a statistical convergence issue.
>
> Hope that helps.
>
> All the best
> Ross
>
>
>
>
> On 11/27/12 3:46 AM, "Fernando Martín García" <fmgarcia.cbm.uam.es>
> wrote:
>
>> Hi Ross,
>>
>> Sorry for the delay. What you saw have sense. Only one question
>> about
>> the suggestion. Amber is now running over a Fedora 13 and as I see
>> in
>> CUDA 4.2, it is available for fedora 14. Could I use that version or
>> should I need to upgrade my Fedora version?. We are finding the same
>> problem with a Tesla 2090 in CentOS, with CUDA 4.2
>>
>> Here there are two outputs only varying the ntpr value (1 and 20,
>> respectively). These simulations where made with GB, but same case
>> happens with explicit waters in ntp ensemble. They were made with
>> cuda
>> 4.2 and a CentOS 6.3. As I said, if you use, for ntpr 1000 and
>> 10000, no
>> differences are observed.
>>
>> A V E R A G E S O V E R 2000 S T E P S
>> vs A V E R A G E S O V E R 100 S T E P S
>>
>>
>> NSTEP = 2000 TIME(PS) = 5.500 TEMP(K) = 300.26
>> PRESS =
>> 0.0 NSTEP = 2000 TIME(PS) = 5.500 TEMP(K) =
>> 300.61 PRESS = 0.0
>> Etot = -1292.3285 EKtot = 1849.9964 EPtot =
>> -3142.3248 Etot = -1292.5560 EKtot = 1852.1738
>> EPtot = -3144.7298
>> BOND = 465.9291 ANGLE = 1462.0663 DIHED =
>> 815.1164 BOND = 464.2868 ANGLE = 1460.7239
>> DIHED
>> = 817.2981
>> 1-4 NB = 525.3658 1-4 EEL = 7885.1834 VDWAALS =
>> -1173.7162 1-4 NB = 525.6464 1-4 EEL = 7880.5082
>> VDWAALS = -1174.9187
>> EELEC = -483.0180 EGB = -12639.2514 RESTRAINT =
>> 0.0000 EELEC = -486.9575 EGB = -12631.3171
>> RESTRAINT = 0.0000
>>
>>
>>--------------------------------------------------------------------------
>>----
>>
>>
>>--------------------------------------------------------------------------
>>----
>>
>> Best regards.
>>
>> Fer
>>
>> On Fri, 23 Nov 2012 17:27:29 -0800, Ross Walker wrote:
>>> Hi Fernando,
>>>
>>> It would help to see select excerpts from your output files in
>>> order
>>> to be
>>> able to understand exactly what you are describing here. Firstly
>>> why
>>> the
>>> GPU code is deterministic on identical hardware this is not
>>> necessarily
>>> true when changing the value of ntpr. The reason for this is that
>>> the
>>> code
>>> only calculates the energy when it is needed for printing. Normally
>>> it
>>> just calculates the gradients. Hence when you change ntpr you
>>> change
>>> the
>>> amount of work being done and this leads to different rounding
>>> differences
>>> that will lead to natural divergence of the MD trajectory. This is
>>> perfectly fine since you should always be running sufficient length
>>> runs
>>> to converge the properties you are interested in.
>>>
>>> Thus what you are reporting is probably reasonable but it would
>>> need
>>> a lot
>>> more info to understand fully.
>>>
>>> Another note though. You mention CUDA 4.0. Currently AMBER 12 with
>>> the
>>> latest patches, including a number of bug fixes for GPU only
>>> supports
>>> CUDA
>>> 4.2. This suggests to me that you are running with unlatched code.
>>> I
>>> would
>>> suggest running configure to apply the latest patches and then
>>> recompiling
>>> from scratch with cuda 4.2.
>>>
>>> All the best
>>> Ross
>>>
>>>
>>>
>>> On 11/23/12 3:14 AM, "Fernando Martín García"
>>> <fmgarcia.cbm.uam.es>
>>> wrote:
>>>
>>>> Dear Amber users,
>>>>
>>>> I have a doubt about ntpr option and GPU (and maybe CPU). I
>>>> usually
>>>> run
>>>> my simulations with ntpr =1000. But we have been trying with
>>>> different
>>>> values (1, 500, 1000 and 10000). The system is running on Amber
>>>> 12,
>>>> GPU
>>>> c2070, cuda 4.0.
>>>>
>>>> We observed that not difference in energy values between ntpr =
>>>> 1000
>>>> and 10000, but they change with ntpr = 1 and 500 (between them and
>>>> with
>>>> the other values). I would like to know if it is caused because
>>>> the
>>>> frequency in writing the energies what provoke the changes in them
>>>> (proportional to the frequency). I think I read something about
>>>> that
>>>> while energies are been printed, leap-frog algorithm continues
>>>> with
>>>> the
>>>> calculation of the coordinates, what would change dt.
>>>>
>>>> Thanks
>>>>
>>>> Fernando
>>>>--
>>>> ==============================================
>>>> Fernando Martín García
>>>> Molecular Modelling Group - Lab 312.1
>>>> Molecular Biology Center "Severo Ochoa"
>>>> C/ NICOLáS CABRERA, 1.
>>>> UAM University. Cantoblanco, 28049 Madrid. Spain.
>>>> TEL: (+34) 91-196-4662 FAX: (+34) 91-196-4420
>>>> Web: http://fertoledo.wordpress.com/
>>>> ==============================================
>>>>
>>>>_______________________________________________
>>>>AMBER mailing list
>>>>AMBER.ambermd.org
>>>>http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>--
>> ==============================================
>> Fernando Martín García
>> Molecular Modelling Group - Lab 312.1
>> Molecular Biology Center "Severo Ochoa"
>> C/ NICOLáS CABRERA, 1.
>> UAM University. Cantoblanco, 28049 Madrid. Spain.
>> TEL: (+34) 91-196-4662 FAX: (+34) 91-196-4420
>> Web: http://fertoledo.wordpress.com/
>> ==============================================
>>
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

-- 
 ==============================================
  Fernando Martín García
  Molecular Modelling Group - Lab 312.1
  Molecular Biology Center "Severo Ochoa"
  C/ NICOLáS CABRERA, 1.
  UAM University. Cantoblanco, 28049 Madrid. Spain.
  TEL: (+34) 91-196-4662 FAX: (+34) 91-196-4420
  Web: http://fertoledo.wordpress.com/
 ==============================================
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Nov 30 2012 - 01:30:03 PST