Re: [AMBER] GB simulation on GPU freezes

From: E. Nihal Korkmaz <enihalkorkmaz.gmail.com>
Date: Fri, 14 Oct 2011 18:15:41 -0500

Unfortunately, it locks up in both 2070 and 480.

I am currently trying same files on a different GPU cluster of 2070s,
fingers crossed!


On Fri, Oct 14, 2011 at 5:29 PM, Scott Le Grand <varelse2005.gmail.com>wrote:

> If it doesn't lock up on the 2070, but does on the 480, it is likely
> defective HW.
>
> If it locks up on the 2070, and Ross can repro it on his 20xxs, I know what
> I'll be doing this weekend :-)...
>
> But shooting from the hip, I'm guessing this is a bad GPU.
>
>
> On Fri, Oct 14, 2011 at 2:55 PM, Ross Walker <rosscwalker.gmail.com>
> wrote:
>
> > Can you send me the input files for one of the simulations that locks
> > please so I can try to reproduce it.
> >
> > Does it lock up on both the GTX480 and C2070?
> >
> > All the best
> > Ross
> >
> >
> >
> > On Oct 14, 2011, at 15:28, "E. Nihal Korkmaz" <enihalkorkmaz.gmail.com>
> > wrote:
> >
> > > Yes, I applied the bugfix patches during the first configuration of
> Amber
> > on
> > > the cluster as directed on the Amber website.
> > >
> > > Not the exact same point, but always after 500 ns for that particular
> > > simulation.
> > > I just realized it got locked up for different proteins (shorter) too
> at
> > > around 200 ns. I simulate a series of the same protein for different
> > > conditions (T and salt conc), some goes smoothly some gets locked up. I
> > > checked the energy logs in the *.out file, nothing seems unusual and
> > nothing
> > > is drastically different between simulations go smooth and those
> freeze.
> > >
> > > Thanks,
> > > Nihal
> > >
> > > On Fri, Oct 14, 2011 at 2:15 PM, Ross Walker <rosscwalker.gmail.com>
> > wrote:
> > >
> > >> There are a lot of unnecessary defaults in your input file. Like
> > specifying
> > >> taup for a GB run. You can probably also set ntwr much larger to
> improve
> > >> performance. And a gamma_ln of 20 is probably a bit high. None of
> these
> > >> should cause a lockup though.
> > >>
> > >> Can you confirm that you are running with the latest bugfixes. In
> > >> particular bugfix.17 for Amber 11.
> > >>
> > >> Also does the calculation always lockup at the exact same point?
> > >>
> > >> All the best
> > >> Ross
> > >>
> > >>
> > >>
> > >> On Oct 14, 2011, at 14:17, "E. Nihal Korkmaz" <
> enihalkorkmaz.gmail.com>
> > >> wrote:
> > >>
> > >>> Amber 11, I tried on GeForce GTX 480 and Tesla C2070 processors, on
> > Linux
> > >>> (CentOS release 5.6). We have Cuda 4 for nvidia compiler. I am
> running
> > >> with
> > >>> pmemd.cuda.
> > >>>
> > >>> and that's my in file below (although same file works ok with the
> > >> homologous
> > >>> structure) :
> > >>> &cntrl
> > >>> imin=0,
> > >>>
> > >>> ntb=0,
> > >>> ntx=5,
> > >>> irest=1,
> > >>>
> > >>> ntpr=200,
> > >>> ntwr=200,
> > >>> ntwx=200,
> > >>> ntwe=200,
> > >>>
> > >>> nstlim=5000000,
> > >>> dt=0.002,
> > >>>
> > >>> ntt=3,
> > >>>
> > >>> temp0=300,
> > >>> tempi=300,
> > >>> ig=-1,
> > >>> tautp=1,
> > >>> gamma_ln=20,
> > >>>
> > >>> ntp=0,
> > >>> pres0=1,
> > >>> taup=1,
> > >>>
> > >>> ntc=2,
> > >>> tol=0.00001,
> > >>>
> > >>> ntf=2,
> > >>> ntb=0,
> > >>> dielc=1,
> > >>> cut=9999,
> > >>> rgbmax=12,
> > >>> ipol=0,
> > >>> ifqnt=0,
> > >>> igb=5,
> > >>> saltcon=0.15,
> > >>> ioutfm=1,
> > >>> nscm=100,
> > >>> &end
> > >>>
> > >>>
> > >>> On Fri, Oct 14, 2011 at 1:05 PM, Scott Le Grand <
> varelse2005.gmail.com
> > >>> wrote:
> > >>>
> > >>>> What revision of AMBER? What GPU? What OS? What driver? What
> > toolkit
> > >> did
> > >>>> you compile with?
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Oct 14, 2011 at 10:55 AM, E. Nihal Korkmaz
> > >>>> <enihalkorkmaz.gmail.com>wrote:
> > >>>>
> > >>>>> Dear all,
> > >>>>>
> > >>>>> I keep having a problem that only for a particular protein the
> > >>>> simulation
> > >>>>> "freezes" and by freeze I mean, it looks like the job is running
> but
> > no
> > >>>>> changes are made on the output files even if you wait 2 days. I am
> > >> using
> > >>>>> igb=5 on GPU, it is a 114 amino acid long protein, I have the
> > >> homologous
> > >>>>> structure running (112 amino acid long) without a problem. But that
> > >>>>> specific
> > >>>>> one stops without being dropped of the queue or any error messages
> at
> > >>>> all.
> > >>>>> I
> > >>>>> checked the output files, no '*' or 'NaN' are present. I also tried
> > >>>> running
> > >>>>> on different machines, same thing happens. I tried starting from a
> > >>>>> different
> > >>>>> restart file, nothing changes. I always freezes although at
> different
> > >>>> time
> > >>>>> steps.
> > >>>>>
> > >>>>> Has anyone have such a problem before? What can be the causes? I'd
> > >>>>> appreciate any comments or suggestions.
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> --
> > >>>>> Elif Nihal Korkmaz
> > >>>>>
> > >>>>> Research Assistant
> > >>>>> University of Wisconsin - Biophysics
> > >>>>> Member of Qiang Cui & Thomas Record Labs
> > >>>>> 1101 University Ave, Rm. 8359
> > >>>>> Madison, WI 53706
> > >>>>> Phone: 608-265-3644
> > >>>>> Email: korkmaz.wisc.edu
> > >>>>> _______________________________________________
> > >>>>> AMBER mailing list
> > >>>>> AMBER.ambermd.org
> > >>>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>>>>
> > >>>> _______________________________________________
> > >>>> AMBER mailing list
> > >>>> AMBER.ambermd.org
> > >>>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Elif Nihal Korkmaz
> > >>>
> > >>> Research Assistant
> > >>> University of Wisconsin - Biophysics
> > >>> Member of Qiang Cui & Thomas Record Labs
> > >>> 1101 University Ave, Rm. 8359
> > >>> Madison, WI 53706
> > >>> Phone: 608-265-3644
> > >>> Email: korkmaz.wisc.edu
> > >>> _______________________________________________
> > >>> AMBER mailing list
> > >>> AMBER.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > >
> > >
> > > --
> > > Elif Nihal Korkmaz
> > >
> > > Research Assistant
> > > University of Wisconsin - Biophysics
> > > Member of Qiang Cui & Thomas Record Labs
> > > 1101 University Ave, Rm. 8359
> > > Madison, WI 53706
> > > Phone: 608-265-3644
> > > Email: korkmaz.wisc.edu
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Elif Nihal Korkmaz
Research Assistant
University of Wisconsin - Biophysics
Member of Qiang Cui & Thomas Record Labs
1101 University Ave, Rm. 8359
Madison, WI 53706
Phone:  608-265-3644
Email:   korkmaz.wisc.edu
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 14 2011 - 16:30:05 PDT
Custom Search