Re: [AMBER] max pairlist cutoff error on octahedral box from Bill Miller III on 2011-02-15 (Amber Archive Feb 2011)

From: Bill Miller III <brmilleriii.gmail.com>
Date: Tue, 15 Feb 2011 07:12:32 -0500

I have also seen this error randomly on a system I am running using regular
pmemd (i.e. not the GPU version) using Amber 11 on 256 processors on Athena.
I have seen the error on four different systems I have been running. The
systems are all fairly large (up to 350,000 atoms). The error never occurs
at the same place twice. However, the error has occurred more frequently for
me than Bongkeun. It sometimes does not happen for several nanoseconds, but
can also happen many times per nanosecond. The error:

| ERROR: max pairlist cutoff must be less than unit cell max sphere
radius!

is written once at the very end of the mdout file, and a couple of hundred
times in the STDOUT file. We also turned verbose on to see what happened
with the forces right before the error. The last step showed NaN for
essentially all energy values, followed of course by the error message.

NET FORCE PER ATOM: 0.1203E-05 0.6295E-06 0.2283E-05

     Evdw = -1296.057283476214
     Ehbond = 0.000000000000
     Ecoulomb = -220472.976904191100

     Iso virial = 199523.479901402000
     Eevir vs. Ecoulomb = 5.356456894359
a,b,c,volume now equal to 168.045 168.045 168.045 3653026.946
NET FORCE PER ATOM: NaN NaN NaN

     Evdw = -1296.057285985989
     Ehbond = 0.000000000000
     Ecoulomb = NaN

     Iso virial = NaN
     Eevir vs. Ecoulomb = 0.000000000000
a,b,c,volume now equal to NaN NaN NaN NaN
| ERROR: max pairlist cutoff must be less than unit cell max sphere
radius!

This shows that, as expected, the system is blowing up right before the job
dies. Below is the pmemd mdin file used for this simulation (with verbose
turned off here, obviously).

MD Run in pmemd.
&cntrl
nstlim=250000, owtnm='O', hwtnm1='H1',
dielc=1, nrespa=1, temp0=310,
tol=1e-05, vlimit=20, iwrap=1, ntc=2,
ig=-1, pres0=1, ntb=2, ntrx=1,
ibelly=0, nmropt=0, hwtnm2='H2',
imin=0, ntxo=1, watnam='WAT', igb=0,
comp=44.6, jfastw=0, ntx=5, ipol=0,
nscm=1000, ntp=1, tempi=0, ntr=0,
ntt=3, ntwr=1000, cut=10, ntave=0,
dt=0.002, ntwx=1000, ntf=2, irest=1,
ntpr=100, taup=1, gamma_ln=5,
ioutfm=1,
/
&ewald
verbose=0, ew_type=0, eedtbdns=500,
netfrc=1, dsum_tol=1e-05, skinnb=2,
rsum_tol=5e-05, nbtell=0, nbflag=1,
frameon=1, vdwmeth=1, order=4, eedmeth=1,
/

I am using cut=10 here, but I have also tried cut=8 with the same results.

I hope all this helps pinpoint the source of the problem. Let me know if you
have any questions or if you have any suggestions.

-Bill

On Mon, Feb 14, 2011 at 4:57 PM, Bongkeun Kim <bkim.chem.ucsb.edu> wrote:

> Hello Ross,
>
> I posted my answers between the lines.
>
> Quoting Ross Walker <ross.rosswalker.co.uk>:
>
> > Hi Bongkeun,
> >
> > Unfortunately it is going to be hard to figure out what is going on here
> > without doing some more digging. The error you see is somewhat misleading
> > since it is effectively what happens if your system blows up. Some atom
> gets
> > a huge force on it etc etc. There are a number of things that can cause
> this
> > including everything from a bug in the code, issues with force field
> > parameters and even flakey hardware. Can you check a few things for me.
> >
> > 1) Verify you definitely have bugfix.12 applied. Your output file should
> > say:
> >
> > |--------------------- INFORMATION ----------------------
> > | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > | Version 2.1
> > | 12/20/2010
> >
> Yes, it is from bugfix.12
>
> > 2) Verify that you can reproduce this error if you start this calculation
> > again on the same hardware. Does it always occur at the same point.
> >
> No, I got this error randomly.
>
> > 3) Confirm exactly what hardware you are using. If this is NOT a C20XX
> > series board then the chance of it being flakey hardware are much higher.
> >
> It's from C1070 family
>
> > 4) Finally try setting NTPR=1 and rerunning the calculation to see if it
> > crashes at the same place. That way we will be able to see exactly what
> > happened before the error was triggered.
> >
> I cannot see any error when using NTPR=1. This error came about once
> in 100ns randomly. I assume than heating on GPU may occur this error,
> so I separated runs in every 10ns and allowed 5 min idling to cool
> down GPUs. Each run spends about 10 hours.
> Thanks.
> Bongkeun Kim
>
> > Thanks,
> >
> > All the best
> > Ross
> >
> >> -----Original Message-----
> >> From: Bongkeun Kim [mailto:bkim.chem.ucsb.edu]
> >> Sent: Monday, February 14, 2011 11:12 AM
> >> To: amber
> >> Subject: [AMBER] max pairlist cutoff error on octahedral box
> >>
> >> Hello,
> >>
> >> I got the following error message when I run on AMBER 11 GPU.
> >> -------------------------------------------------------------
> >> NSTEP = 420000 TIME(PS) = 155540.000 TEMP(K) = 312.04 PRESS
> >> = -187.3
> >> Etot = -18757.4114 EKtot = 4575.5386 EPtot =
> >> -23332.9500
> >> BOND = 58.8446 ANGLE = 136.7403 DIHED =
> >> 166.0070
> >> 1-4 NB = 56.7262 1-4 EEL = -31.1536 VDWAALS =
> >> 3080.9761
> >> EELEC = -26801.0907 EHBOND = 0.0000 RESTRAINT =
> >> 0.0000
> >> EKCMT = 2199.9567 VIRIAL = 2501.7076 VOLUME =
> >> 74625.8314
> >> Density =
> >> 0.9841
> >>
> >>
> >
> ----------------------------------------------------------------------------
> > --
> >>
> >> | ERROR: max pairlist cutoff must be less than unit cell max sphere
> > radius!
> >> -----------------------------------------------------------------
> >>
> >> This error occurred randomly and once I used the last rst file I can
> >> continue running. I already applied bugfix 12 and I used cutoff=8
> >> Please let me know how to avoid this error.
> >> Thank you.
> >> Bongkeun Kim
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Bill Miller III
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-6715
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue Feb 15 2011 - 04:30:02 PST