My current suggestion is the same as Ross's - do not run MPI with one
process.
This is not a bug worth fixing IMO. And that's because I'm currently
redesigning the multi-GPU architecture to reflect the tripling in AMBER
performance since I wrote it. I'd rather put my time into the future than
address a use case that's kinda useless.
if see this with REMD runs, that's a whole different matter. Otherwise
it's will not fix because I'm building something to replace it.
Scott
On Wed, Jun 19, 2013 at 5:19 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
> Dear Yoshihisa,
>
> Granted this should not be failing in this way but I have to question why
> you would want to run with mpirun -np 1? - All it does is add overhead and
> slow the simulation down. It is also a configuration that is not tested,
> hence why it has not been noticed that it was failing.
>
> We'll take a look.
>
> All the best
> Ross
>
>
>
>
> On 6/17/13 11:19 PM, "Nakashima, Yoshihisa" <nakashima_y.jp.fujitsu.com>
> wrote:
>
> >Dear Amber community
> >
> >Hello,
> >
> >I tried to run Cellulose NVE included in Amber12_GPU_BMT suite with
> >GPGPU(K20X).
> >The case of serial version (pmemd.cuda) and the parallel version
> >(pmemd.cuda.MPI) with 2process + 2GPU were OK,
> >but the caes of the parallel version (pmemd.cuda.MPI) with 1process +
> >1GPU,
> >the following message was desplayed and this test failed.
> >
> >***********
> ># mpiexec -np 1 pmemd.cuda.MPI -O -i mdin -p prmtop -c inpcrd -o
> >mdout_intel_gpu1pro_0618
> >
> >gpu_download_partial_forces: download failed unspecified launch failure
> >
> >==========================================================================
> >=========
> >= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >= EXIT CODE: 255
> >= CLEANING UP REMAINING PROCESSES
> >= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >==========================================================================
> >=========
> >**************
> >
> >(e.g. the case of no problem)
> >No problem: # mpiexec -np 2 pmemd.cuda.MPI -O -i mdin -p prmtop -c inpcrd
> >-o mdout_intel_gpu2pro_0618
> >No problem: # pmemd.cuda -O -i mdin -p prmtop -c inpcrd -o
> >mdout_intel_gpu1pro_0618
> >
> >
> >This problems is only this case (pmemd.cuda.MPI + 1 GPU) .
> >With other 8 BMT (Cellulose NPT, TRPCage and so on), there is no problem.
> >
> >I don't know why the problem occur.
> >Could you give me an advice to solve this problem ?
> >
> >
> >
> >Following is information
> >
> >- Configuration
> >OS RHEL6.1
> >CPU 2x Xeon E5-2680
> >Amber version: 12 (Patched bugfix from 1 to 18)
> >AmberTools version: 13 (Patched bugfix from 1 to 9)
> >MPI: MPICH2-1.5
> >GNU: 4.4.5
> >GPU: 2x K20X
> >GPU Device Driver: 304.64
> >CUDA: 5.0
> >
> >
> >- Input file is following, it is the same as the file that is described
> >on Amber's web site.
> >(http://ambermd.org/gpus/benchmarks.htm)
> >
> >5) Cellulose NVE = 408,609 atoms
> >************
> >Typical Production MD NVE with
> >GOOD energy conservation.
> > &cntrl
> > ntx=5, irest=1,
> > ntc=2, ntf=2, tol=0.000001,
> > nstlim=10000,
> > ntpr=1000, ntwx=1000,
> > ntwr=10000,
> > dt=0.002, cut=8.,
> > ntt=0, ntb=1, ntp=0,
> > ioutfm=1,
> > /
> > &ewald
> > dsum_tol=0.000001,
> > /
> >**************
> >
> >
> >- The last part of output file is
> >
> >**************
> >--------------------------------------------------------------------------
> >------
> > 4. RESULTS
> >--------------------------------------------------------------------------
> >------
> >
> > ---------------------------------------------------
> > APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
> > using 5000.0 points per unit in tabled values
> > TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
> >| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
> >| CHECK d/dx switch(x): max rel err = 0.8987E-11 at 2.875760
> > ---------------------------------------------------
> >|---------------------------------------------------
> >| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
> >| with 50.0 points per unit in tabled values
> >| Relative Error Limit not exceeded for r .gt. 2.52
> >| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
> >| with 50.0 points per unit in tabled values
> >| Relative Error Limit not exceeded for r .gt. 2.92
> >|---------------------------------------------------
> >************
> >
> >
> >Thank you for your support.
> >
> >Best wishes,
> >Y. Nakashima
> >
> >
> >
> >
> >----
> >-----------------------------------------
> >Yoshihisa Nakashima
> >Tel: +81-44-754-3174
> >E-mail:(nakashima_y.jp.fujitsu.com)
> >
> >
> >
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 19 2013 - 10:30:04 PDT