Re: [AMBER] NaN error on traj and output with AMBER CUDA - strange reproducable error

From: Marek Maly <marek.maly.ujep.cz>
Date: Sun, 23 Jan 2011 01:44:06 +0100

Hello Jason,

thanks again for all your comments !

Simulation with ig=-1 finally also crashed on the start of 71th simul.
part.
(again the same problem (as on the end of original 60th part) with the
last restart file of 70th part).
But anyway all trajectories 60,61 ... 70 are OK. I visualized them in
Chimera and all 11 x 50 = 550 frames
was displayed without problem.

I just saw in cca 20 frames long bond ( 999.3638 A) between hydrogen and
oxygen of the given water molecule but
when I checked given records in *.out file I did not see any serious jump
in bond energy, so this will be probably
just some graphical issue. After reimaging into original box using
"ptraj", everything is perfect.

The reason why all the frames in MDCRD files are OK although in case of
RST file some coordinates were on same "border" of
given formating is one place before decimal point more in case of MDCRD
files so here is the limit 9999.xx and
in the RST file it is just 999.xx. During that 11 simulation parts (60-70)
probably lot of RST files rewrited during the
simulation were damaged but for successful continuation of the simulation
only the last RST file of the given sim. period
was important and that were by chance in all 10 cases OK, but of course
the probability that also this last RST file will
be broken was increased with time as the original water molecules diffuse
away from the original box with the time fortunately
just with SQR(t) dependence :))


Thanks again for your help !


    Best wishes,

       Marek





Dne Sat, 22 Jan 2011 16:54:04 +0100 Jason Swails <jason.swails.gmail.com>
napsal/-a:

> 2011/1/22 Marek Maly <marek.maly.ujep.cz>
>
>> Dear Jason,
>> first of all thank you very much for your comments !
>>
>> I apologize for my bad RST format interpretation, I was
>> assuming format:
>>
>> xi yi zi vxi vyi vzi
>>
>
> Ah, this explains the line you specified (and is certainly a reasonable
> way
> of formatting the restart file, though it's not the one that's used). In
> that case, you were actually looking at the velocities of the atoms that
> were failing. Velocities should never overflow the restart file,
> especially
> since the max limit is typically 20 amber units.
>
> If you ever have questions regarding the format of amber files, see
> http://ambermd.org/formats.html.
>
>
>>
>> I just quickly verified that there is no NaN in the file and that there
>> is
>> the same number of rows as is the number of atoms
>> (without two first and one last record) and did not check the formating
>> on
>> Amber web page.
>>
>>
>> Your explanation seems to be logic and clear although there are two
>> strange things:
>>
>> a)
>> You are right, my MDCRD files are in ASCII format, but as I wrote
>> before,
>> the trajectory
>> of the 60th part of the simulation prod60_G4malTRI_ANS.mdcrd is OK
>> including the last frame.
>> At least I was able to load and visualised all 50 frames in UCSF Chimera
>> software and
>> also VMD did not reported any errors. Moreover I did not find any *
>> characters within this file.
>> You can download and check it here
>> http://physics.ujep.cz/~mmaly/amber/<http://physics.ujep.cz/%7Emmaly/amber/>
>> (prod60_G4malTRI_ANS.mdcrd)
>>
>
> Hmm. This is strange... Perhaps visualizing will help (that file is a
> bit
> too large for me to download at home).
>
>
>>
>>
>> b)
>> I agree with you that just setting ig=-1 should not solve the problem,
>> just postpone it.
>> That was one reason why I put my question on Amber forum.
>> But anyway it is a little surprising that this change (write ig=-1 in my
>> *.in file)
>> which I did on the start of 60th simulation part (of one of my
>> verification run) solved the situation
>> for consequent parts 61,62,63,64,65,66,67 (where 67 is actually in
>> progress, ig=-1 is valid
>> for part 60 and all the consequent sim. parts.) where each part has 250
>> 000 1fs time steps.
>> I would suppose crash during 61 or 62 part.
>>
>
> Hmm. Also strange. But again, visualizing would probably help here. It
> could be that (since you're using Langevin dynamics), a random kick was
> applied to the offending atom(s) in the *opposite* direction, so it's
> taking
> a bit longer to appear. If you added iwrap, though, this would explain
> it.
>
>
>>
>> Anyway thanks for your recommendation regarding iwrap=1. As I am not
>> interested
>> about the diffusion phenomena, it is a good solution. The problem is
>> that
>> I newer experienced this type of errors as I was up to "now" using just
>> CPUs where the simulations
>> were a little shorter :)) so this is a brand new phenomenon for me which
>> appeared with
>> long simulation times which is in real time possible to achieve with
>> GPUs.
>>
>
> Awesome! This was the goal in its implementation in the first place.
>
>
>> Thanks also for NetCDF MDCRD format recommendation. If standard
>> visualisation softwares which I use,
>> (UCSF Chimera, VMD) has no problems with this format, there is no reason
>> for me to use ASCII anymore.
>>
>
> Glad to have convinced you of NetCDF over ASCII trajectories.
>
> All the best,
> Jason
>


-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jan 22 2011 - 17:00:04 PST
Custom Search