Hi Himanshu,
Does anything similar happen if you run the CPU code in parallel or does it ONLY occur with the GPU code. What about running on just a single GPU? - Often the M2090's can lock up if they are not properly cooled and this would manifest itself in the job seeming to hang.
I would start by verifying that it only ever happens when running the GPUs in parallel. This will give us an idea of where to look.
All the best
Ross
> -----Original Message-----
> From: HIMANSHU JOSHI [mailto:himanshuphy87.gmail.com]
> Sent: Tuesday, June 26, 2012 11:55 PM
> To: AMBER Mailing List
> Subject: [AMBER] PMEMD CUPA MPI STOP WRITING OUTPUT IN RUNNIG STATUS
>
> Dear friends ,
> I am facing unusual problem with mpi version of AMBER 11 pmemd_cuda
> compiled with mvapitch
> My job stops writing output (trajectories, restart, .out) after some
> time
> but while using top and pbs commands it shows pmemd in running status.
> It is not even producing any error file also. After it if I kill it and
> submit it again it start running.
>
> Have anyone faced it ? If yes please let me know the problem as I am
> not
> able to figure out it .
> *
> The machine is *
> CUDA Device Name: Tesla M2090
> *
> Here is my input file
> *
> Initial minimization w/ position restraints on DNA, 9.0 cut
> &cntrl
> nmropt = 0,
> ntx = 7, irest = 1, ntrx = 1, ntxo = 1,
> ntpr = 100, ntwx = 500, ntwv = 0, ntwe = 0,
> ntwprt = 0, ntwr = 500,
>
> ntf = 2, ntb = 1, dielc = 1.0,
> cut = 9.0, nsnb = 10,
>
> ipol = 0,
>
> ibelly = 0, ntr = 0,
>
> imin = 0,
> maxcyc = 5000,
> ncyc = 2000,
> ntmin = 1, dx0 = 0.1, dxm = 0.5, drms =
> 0.0001,
>
> nstlim = 500000
> nscm = 1000,
> t = 0.0, dt = 0.002,
>
> temp0 = 300.0, tempi = 10.0,
> ig = 71277, heat = 0.0,
> ntt = 1, dtemp = 0.0,
> tautp = 1.0,
> ntp = 0, pres0 = 1.0, comp = 44.6,
> taup = 0.5,
>
> ntc = 2, tol = 0.0005,
>
> &end
> &ewald
> a = 62.9065214, b = 46.8818659, c = 192.6251811,
> &end
>
> Thanks in advance
>
>
> --
> *With Regards,
> HIMANSHU JOSHI
> Graduate Scholar, Center for Condense Matter Theory
> Department of Physics IISc.,Bangalore India 560012*
> ॐ सर्वे भवन्तु सुखिनः सर्वे सन्तु निरामयः।
> सर्वे भद्रणिपश्यन्तु मा कश्चिद्दुःख भाग भवेत्॥
> <http://www.rediffmail.com/cgi-
> bin/red.cgi?red=http%3A%2F%2Fsigads%2Erediff%2Ecom%2FRealMedia%2Fads%2F
> click%5Fnx%2Eads%2Fwww%2Erediffmail%2Ecom%2Fsignatureline%2Ehtm%40Middl
> e%3F&isImage=0&BlockImage=0&rediffng=0>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 27 2012 - 09:30:03 PDT