My guess is that initially cracked system finally leads to NaN error. All
that system has concerning properties is coordinates and velocities (if all
subsequent runs are restarted + ntx=5). Energy parameters, such as Ekinetic
and Epotential, and decomposed terms look OK.
Visual inspection also cannot reveal problems (no clashes, unusual bonds
and so on). So the rest is velocity distribution. Probably, one should
check it with a special script.
Error with memory shortage is much simpler. It can be easily located.
Fortunately, I didn't faced with other errors when I use extensively even
cheap cards. And again, CPU code is more stable to that NaN error, so my
second guess is that it can be overcome in the same manner as CPU code does.
To be honest, there are also numerous errors on GPU clusters. But I don't
have any suggestions about their nature, too many factors...
2012/2/20 Aron Broom <broomsday.gmail.com>
> Do you think the error is something that tends to occur early in a
> simulation, and if you get past a certain critical point you are ALMOST
> safe? I suppose I could test this by having restart files written every
> step for instance and look at the step-dependant distribution of the
> error. My intuitive impression has been that the errors tend to creep up
> quickly, and if the simulation is stable for a few minutes it continues to
> be stable, but that is really not based on any hard evidence, just a gut
> feeling.
>
> --
Sincerely,
Dmitry Mukha
Institute of Bioorganic Chemistry, NAS, Minsk, Belarus
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Feb 21 2012 - 00:30:02 PST