Re: [AMBER] error in REMD on GPUs

From: Marcela Madrid <mmadrid.psc.edu>
Date: Tue, 25 Jun 2019 09:31:59 -0400

hi Bill,

thanks for your answer. It is in the correct directory and it can see the file,
I added pwd and also “more remd7.in” and it prints the input file:

> ./Run_gpu
> Production REMD
> &cntrl
> irest=1,ntx=5,
> ntb=1,cut=9.0,
> ntc=2,ntf=2,
> ntt=3,temp0=290.00,gamma_ln = 1.0, ig=-1,
> numexchg=125000, nstlim=300,dt=0.002,
> ntpr=100,ntwx=2500,ntwr=500000,
> ioutfm=1,iwrap=1,
> /
> /pylon5/pscstaff/mmadrid/username/scr
>
> Running multipmemd version of pmemd Amber18
> Total processors = 2
> Number of groups = 2
>
>
> Unit 5 Error on OPEN: remd7.in
>
> Unit 5 Error on OPEN: remd7.in
> Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1

The same script and inputs work with 2 processors on the regular nodes (not GPU) with the
sander.MPI executable instead of the pmemd.cuda.MPI, that is puzzling.

thanks again,
Marcela



> On Jun 25, 2019, at 12:05 AM, Bill Ross <ross.cgl.ucsf.edu> wrote:
>
> $ mpirun -np 2 (pwd; $AMBERHOME/bin/pmemd.cuda.MPI -ng 2 -groupfile group ...)
>
> If running from cmd line, then I'd try the above.
>
> Also, if this doesn't solve it, best to paste your whole cmd line for people who know better to see.
>
> Bill
>
> On 6/24/19 9:01 PM, Bill Ross wrote:
>> What if you add the cmd
>>
>> pwd
>>
>> in your script before it starts the program?
>>
>> This will verify if you are in the Matrix or consensual silicon. :-)
>>
>> Bill
>>
>>
>> On 6/24/19 8:36 PM, Marcela Madrid wrote:
>>> Hello I have compiled Amber 18 with cuda/9.2
>>>
>>> When I try to run REMD on GPUs I am getting the following error message:
>>>
>>>> Running multipmemd version of pmemd Amber18
>>>> Total processors = 2
>>>> Number of groups = 2
>>>>
>>>>
>>>> Unit 5 Error on OPEN: remd7.in
>>>>
>>>> Unit 5 Error on OPEN: remd7.in
>>> The command that I am using on 2 P100 processor is:
>>>
>>> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -ng 2 -groupfile group
>>>
>>>
>>> The same example runs on CPUs. What can be the problem? The file remd7.in exists in the directory
>>> and the same example runs on the CPUs.
>>> thanks,
>>> Marcela
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 25 2019 - 07:00:02 PDT
Custom Search