Re: [AMBER] Problematic structure after minimization on GPU from Ross Walker on 2013-08-26 (Amber Archive Aug 2013)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 26 Aug 2013 08:15:02 -0700

Hi Jan-Philip

>My mail was too long, sorry. Important points:
>
>- the min1+min2+heat protocol runs fine with the CPU version, i.e. CPU
>minimization solves the problem. The crash during heatup, however, also
>appears in the CPU version when starting from the minimized structure as
>created by the GPU code.

Well, that explains it - it is simply a limitation of the precision model.
If you use the SPDP or DPDP version of the GPU code it will likely work
fine. Ultimately though I think the advice should simply be to carry out
minimization using the CPU code.

>
>- the error message in the GPU version can be improved. The CPU version
>informs about too large velocities. The GPU version just says 'launch
>failure launching kernel kNLSkinTest'.

This is extremely difficult to do without destroying performance.
Essentially the advice here is the same as it has always been. If you see
an unexplained crash on the GPU code try it with the CPU. Either way both
simulations are wrong - just the error message is more informative on the
CPU.

>
>- it is very interesting to understand what is wrong with the structure
>in min2.rst as created by the GPU version. MOE does not identify
>clashes, the structure looks fine. Any pointers what I can check to
>identify the problematic part in this structure?

The structure is simply wrong - end of story. There is nothing really to
identify here. Essentially the forces are too large, they get truncated
and the resulting gradient does not match the energy. Thus the
minimization algorithm gets horribly confused since the change in energy
that occurs when modifying the structure as indicated by the forces is not
what it expected. Probably if you just stuck with steepest descent it
'might' work but the minimizer will still struggle. I think ultimately we
should just make the truncation fatal and quit with an error message
saying the structure is too strained for the GPU, switch to CPU
minimization. I'll see if I can do this for the next update.

In terms of your initial structure, if you really are interested I would
dump the force array and look at the atoms with the largest forces. You
could also take the CPU minimization where on step 1 it should report the
atom with the highest force (GMax) take a look in the vicinity of that
atom. It's probably coming from the VDW term. Likely two atoms are just
inside the VDW radius of each other and the r^12 term here gives a massive
force.

>> If you are using the
>> latest version of the code it truncates the forces at the largest
>> representation that SPFP supports - in most cases this works and will
>>get
>> you out of trouble but if your initial structure is too strained it will
>> also likely break the minimizer.
>
>When I read in the changelog that the truncation was implemented in GPU
>minimization code, I switched back to GPU minimization. For the system
>in question the mean part is that it 'works' in the sense that the
>minimization does not quit with an error and the output looks fine.

The truncation was a hack that I hoped would work - for most systems it
seems to work fine but clearly there are exceptions that I wasn't
anticipating and the minimizer is not as robust as I was hoping. Thus we
probably need a bugfix to disable this.

All the best
Ross

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 26 2013 - 08:30:03 PDT