Re: [AMBER] PMEMD CUPA MPI STOP WRITING OUTPUT IN RUNNIG STATUS

From: HIMANSHU JOSHI <himanshuphy87.gmail.com>
Date: Wed, 27 Jun 2012 23:06:22 +0530

Dear Ross,
Its happening with parallel gpu only and its not often.
Thanks for suggestion next time when it hangs I will notice the temp. of
machine.


On Wed, Jun 27, 2012 at 9:41 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Himanshu,
>
> Does anything similar happen if you run the CPU code in parallel or does
> it ONLY occur with the GPU code. What about running on just a single GPU? -
> Often the M2090's can lock up if they are not properly cooled and this
> would manifest itself in the job seeming to hang.
>
> I would start by verifying that it only ever happens when running the GPUs
> in parallel. This will give us an idea of where to look.
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: HIMANSHU JOSHI [mailto:himanshuphy87.gmail.com]
> > Sent: Tuesday, June 26, 2012 11:55 PM
> > To: AMBER Mailing List
> > Subject: [AMBER] PMEMD CUPA MPI STOP WRITING OUTPUT IN RUNNIG STATUS
> >
> > Dear friends ,
> > I am facing unusual problem with mpi version of AMBER 11 pmemd_cuda
> > compiled with mvapitch
> > My job stops writing output (trajectories, restart, .out) after some
> > time
> > but while using top and pbs commands it shows pmemd in running status.
> > It is not even producing any error file also. After it if I kill it and
> > submit it again it start running.
> >
> > Have anyone faced it ? If yes please let me know the problem as I am
> > not
> > able to figure out it .
> > *
> > The machine is *
> > CUDA Device Name: Tesla M2090
> > *
> > Here is my input file
> > *
> > Initial minimization w/ position restraints on DNA, 9.0 cut
> > &cntrl
> > nmropt = 0,
> > ntx = 7, irest = 1, ntrx = 1, ntxo = 1,
> > ntpr = 100, ntwx = 500, ntwv = 0, ntwe = 0,
> > ntwprt = 0, ntwr = 500,
> >
> > ntf = 2, ntb = 1, dielc = 1.0,
> > cut = 9.0, nsnb = 10,
> >
> > ipol = 0,
> >
> > ibelly = 0, ntr = 0,
> >
> > imin = 0,
> > maxcyc = 5000,
> > ncyc = 2000,
> > ntmin = 1, dx0 = 0.1, dxm = 0.5, drms =
> > 0.0001,
> >
> > nstlim = 500000
> > nscm = 1000,
> > t = 0.0, dt = 0.002,
> >
> > temp0 = 300.0, tempi = 10.0,
> > ig = 71277, heat = 0.0,
> > ntt = 1, dtemp = 0.0,
> > tautp = 1.0,
> > ntp = 0, pres0 = 1.0, comp = 44.6,
> > taup = 0.5,
> >
> > ntc = 2, tol = 0.0005,
> >
> > &end
> > &ewald
> > a = 62.9065214, b = 46.8818659, c = 192.6251811,
> > &end
> >
> > Thanks in advance
> >
> >
> > --
> > *With Regards,
> > HIMANSHU JOSHI
> > Graduate Scholar, Center for Condense Matter Theory
> > Department of Physics IISc.,Bangalore India 560012*
> > ॐ सर्वे भवन्तु सुखिनः सर्वे सन्तु निरामयः।
> > सर्वे भद्रणिपश्यन्तु मा कश्चिद्दुःख भाग भवेत्॥
> > <http://www.rediffmail.com/cgi-
> > bin/red.cgi?red=http%3A%2F%2Fsigads%2Erediff%2Ecom%2FRealMedia%2Fads%2F
> > click%5Fnx%2Eads%2Fwww%2Erediffmail%2Ecom%2Fsignatureline%2Ehtm%40Middl
> > e%3F&isImage=0&BlockImage=0&rediffng=0>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
*With Regards,
HIMANSHU JOSHI
Graduate Scholar, Center for Condense Matter Theory
Department of Physics IISc.,Bangalore India 560012*
ॐ सर्वे भवन्तु सुखिनः सर्वे सन्तु निरामयः।
सर्वे भद्रणिपश्यन्तु मा कश्चिद्दुःख भाग भवेत्॥
<http://www.rediffmail.com/cgi-bin/red.cgi?red=http%3A%2F%2Fsigads%2Erediff%2Ecom%2FRealMedia%2Fads%2Fclick%5Fnx%2Eads%2Fwww%2Erediffmail%2Ecom%2Fsignatureline%2Ehtm%40Middle%3F&isImage=0&BlockImage=0&rediffng=0>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 27 2012 - 11:00:03 PDT
Custom Search