Re: [AMBER] REMD error from Marcela Madrid on 2019-07-02 (Amber Archive Jul 2019)

From: Marcela Madrid <mmadrid.psc.edu>
Date: Tue, 2 Jul 2019 14:59:04 -0400

Thanks Koushik,

The user is getting this same error about not finding the input files. That is why I am doing these tests for her.
I run from the directory $AMBERHOME/test/cuda/remd/rem_2rep_gb
export DO_PARALLEL=“mpirun -np 2"
with 2 GPUs and number of tasks =2
And this is the error that I am getting:

> ./Run.rem.sh
> No precision model specified. Defaulting to DPFP.
> --------------------------------------------------------------------------------
> Two replica GB REMD test.
>
> Running multipmemd version of pmemd Amber18
> Total processors = 2
> Number of groups = 2
>
>
> Unit 5 Error on OPEN: rem.in.001
>
> Unit 5 Error on OPEN: rem.in.001
> Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
> ./Run.rem.sh: Program error
>
I commented out the line at the end of Run.rem.sh so that it does not erase the input files and they are there:
more rem.in.001

Ala3 GB REMD
&cntrl
   imin = 0, nstlim = 100, dt = 0.002,
   ntx = 5, irest = 1, ig = -71277,
   ntwx = 500, ntwe = 0, ntwr = 500, ntpr = 100,
   ioutfm = 0, ntxo = 1,
   ntt = 1, tautp = 5.0, tempi = 0.0, temp0 = 350.0,
   ntc = 2, tol = 0.000001, ntf = 2, ntb = 0,
   cut = 9999.0, nscm = 500,
   igb = 5, offset = 0.09,
   numexchg = 5,
&end

So at least I know that your are running the same way we are, and not getting error message.
It is quite puzzling.

Marcela

> On Jul 2, 2019, at 11:56 AM, koushik kasavajhala <koushik.sbiiit.gmail.com> wrote:
>
> Hi Marcela,
>
> Our lab also uses a slurm queuing system. We use the below script, which is
> similar to your script, to submit 2 replica REMD jobs to one node.
>
> #!/bin/bash
> #SBATCH -N 1
> #SBATCH --tasks-per-node 2
> #SBATCH --gres=gpu:2
>
> mpirun -np 2 /opt/amber/bin/pmemd.cuda.MPI -O -ng 2 -groupfile groupremd
>
> So, I do not see anything wrong with your submission script. Since your CPU
> jobs run fine, I think there might be some issue with the way the GPUs are
> configured on your cluster. Note: CPU REMD jobs require 2 cpus per replica
> whereas the GPU REMD jobs require only 1 gpu per replica.
>
> I just ran the test cases with 2 and 4 replicas, they all pass for me. If
> you are having issues with the test cases, I think something might be wrong
> with the way files are being sourced. I don't think it is a compiler issue
> either. We use gnu compilers on our cluster and all tests pass for us.
>
> Can you run the test cases inside the directory that David Case pointed
> out? There is a Run.rem.sh file inside AMBERHOME/test/cuda/remd/rem_2rep_gb
> directory. Executing this file should not give the error message that input
> files were not found. If this doesn't work, then can you post the error the
> user had? They might have had a different error instead of input files not
> being found.
>
> .David Case: I looked at the files in test/cuda/remd folder. rem_gb_2rep,
> rem_gb_4rep, rem_wat are not used at all. Deleting those folders did not
> affect any of the test cases; they all passed.
>
> Best,
> Koushik
> Carlos Simmerling Lab
>
>
>
> On Tue, Jul 2, 2019 at 9:57 AM Marcela Madrid <mmadrid.psc.edu> wrote:
>
>> hi Dave,
>>
>> thanks for your answer. It is not just a problem with the test examples.
>> It is a problem whenever we try to run REMD on the GPUs on Bridges at the
>> PSC.
>> The reason why I am looking at it is a user wants to run it. REMD on the
>> CPUs works fine (with the corresponding executable of course),
>> it is just a problem with the GPUs. So it occurred to me to see if it
>> passes the tests and we have the same error
>> messages. The user has her input files in the directory where she runs.
>>
>> I think it is either a problem with the configuration of the GPU nodes on
>> Bridges or a bug.
>> Each Bridges node has 24 cores and 2 P100 GPUs. I have asked for 1 node,
>> ntasks-per-node=2 and the 2 GPUs
>> but I get the error message about not finding the input files.
>> Amber on GPUs was compiled with
>> ./configure -cuda -mpi gnu
>> Attempting to compile with intel compilers instead of gnu gave error
>> messages.
>>
>> O3 -ccbin icpc -o cuda_mg_wrapper.o -c cuda_mg_wrapper.cu
>> In file included from
>> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/host_config.h(50),
>> from
>> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/cuda_runtime.h(78),
>> from cuda_mg_wrapper.cu(0):
>> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/crt/host_config.h(79):
>> error: #error directive: -- unsupported ICC configuration! Only
>> ICC 15.0, ICC 16.0, and ICC 17.0 on Linux x86_64 are supported!
>> #error -- unsupported ICC configuration! Only ICC 15.0, ICC 16.0, and
>> ICC 17.0 on Linux x86_64 are supported!
>>
>> We do not have such old versions of the compilers. Any hints will be
>> appreciated
>> as to how to run REMD on the GPUS. Thanks so much,
>>
>> Marcela
>>
>>
>>> On Jul 2, 2019, at 9:01 AM, David A Case <david.case.rutgers.edu> wrote:
>>>
>>> On Mon, Jul 01, 2019, Marcela Madrid wrote:
>>>
>>>>> Two replica GB REMD test.
>>>>>
>>>>>
>>>>> Unit 5 Error on OPEN: rem.in.001
>>>
>>> OK: query for the REMD experts: in AMBERHOME/test/cuda/remd there are
>>> two directories: rem_2rep_gb and rem_gb_2rep. The rem.in.00? files are
>>> in the former, but the tests actually get run in the latter directory.
>>>
>>> Same general problem for rem_2rep_pme: the needed rem.in.00? files are
>>> in rem_wat_2 (or maybe in rem_wat).
>>>
>>> I'm probably missing something here, but cleaning up (or at least
>>> commenting) the cuda/remd test folder seems worthwhile: there are
>>> folders that seem never to be used, and input files that seem to be in
>>> the wrong place.
>>>
>>> Marcela: I'd ignore these failures for now; something should get posted
>>> here that either fixes the problem, or figures out a problem with your
>>> inputs. (My money is on the former.)
>>>
>>> ...dac
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jul 02 2019 - 12:00:02 PDT