Hi Fabricio,
Are you per chance using a GTX480 or GTX580 (or related, 470, 590) GPU? If
so have you applied all of the bugfixes on
http://ambermd.org/bugfixes11.html
Specifically bugfix.12
5) Possible (not fully tested) workaround for NVIDIA GTX4XX
and GTX5XX series cards to avoid possible random hangs
during PME simulations.
If not please apply these, recompile and see if that fixes your problem. If
you have indeed applied all of these then we will need a lot more
information to try to reproduce this problem.
Specifically what is wrong with your restart file that prevents you using it
to restart a job? - Is it empty? Does it contain *'s etc.
For example if it contains *'s that means that your system blew up. Often
this can happen because of problems with your initial structure (or can
actually occur because of underlying flaws in the actual force field such as
the fact that hydroxyl hydrogens have no VDW radii so can collapse onto
something highly charged like a phosphate and then give a singularity and
thus a NAN for the energy and an infinite force - something that is much
more likely to occur with the longer timescales you can run on a GPU).
Situations where your system gives big problems, like blowing up will result
in the GPU run hanging rather than quiting with a segfault as you'd see on
the CPU.
It will take more info about your simulation to determine what is going on
here.
ps. You could also set ntwr < 0 (i.e. -20000) and then you will get a unique
restart file saved each time ntwr is triggered. This way if your last
restart file contains an error you can go back to the one before that. It
also means you can test if your blow up is reproducible.
E.g. suppose your simulation stops at 100001 steps having written a restart
containing *'s at step 100000. This way you will have the restart at 98000
steps and you can use this and see if your simulation again blows up
somewhere in the next 20 to 30K steps or so.
All the best
Ross
> -----Original Message-----
> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
> Sent: Monday, June 13, 2011 8:42 AM
> To: amber.ambermd.org
> Subject: [AMBER] Restart problems with pmemd.cuda
>
> I have had this problem more than once now. I am using the latest
> version of amber and ambertools to run molecular dynamic simulations
> with cuda. Everytime I kill the pmemd.cuda job before the end of the
> calculation, the job won't restart from the rst file. I have solved
> the problem once by creating a new restart file using ptraj, but this
> does not work every time. Once I use the command "pmemd.cuda
> -.....inputfiles with the rst restart file...etc", no error message
> comes out. The output file .out simply stops being generated and the
> info file shows no sign of error at all.
> Has anyone had this problem before? Is there a way to solve it?
>
> Thank you
> Fabrício Bracht
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jun 13 2011 - 09:30:04 PDT