Re: [AMBER] REMD error from koushik kasavajhala on 2019-07-02 (Amber Archive Jul 2019)

From: koushik kasavajhala <koushik.sbiiit.gmail.com>
Date: Tue, 2 Jul 2019 16:23:29 -0400

Interesting. I have attached my rem_2rep_gb test directory as a reference.
I just want to make sure you do not have a corrupted version of AMBER.
Check if there are differences between your files and the attached files.
After that, can you comment out lines 52-57 and run the test and let us
know the output of rem.out.000 file? There is usually more information in
that file if a program fails.

Also, do you have any issues running other tests besides REMD tests?

On Tue, Jul 2, 2019 at 2:59 PM Marcela Madrid <mmadrid.psc.edu> wrote:

> Thanks Koushik,
>
> The user is getting this same error about not finding the input files.
> That is why I am doing these tests for her.
> I run from the directory $AMBERHOME/test/cuda/remd/rem_2rep_gb
> export DO_PARALLEL=“mpirun -np 2"
> with 2 GPUs and number of tasks =2
> And this is the error that I am getting:
>
> > ./Run.rem.sh
> > No precision model specified. Defaulting to DPFP.
> >
> --------------------------------------------------------------------------------
> > Two replica GB REMD test.
> >
> > Running multipmemd version of pmemd Amber18
> > Total processors = 2
> > Number of groups = 2
> >
> >
> > Unit 5 Error on OPEN: rem.in.001
>
>
>
> >
> > Unit 5 Error on OPEN: rem.in.001
>
>
>
> > Abort(1) on node 1 (rank 1 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 1) - process 1
> > ./Run.rem.sh: Program error
> >
> I commented out the line at the end of Run.rem.sh so that it does not
> erase the input files and they are there:
> more rem.in.001
>
> Ala3 GB REMD
> &cntrl
> imin = 0, nstlim = 100, dt = 0.002,
> ntx = 5, irest = 1, ig = -71277,
> ntwx = 500, ntwe = 0, ntwr = 500, ntpr = 100,
> ioutfm = 0, ntxo = 1,
> ntt = 1, tautp = 5.0, tempi = 0.0, temp0 = 350.0,
> ntc = 2, tol = 0.000001, ntf = 2, ntb = 0,
> cut = 9999.0, nscm = 500,
> igb = 5, offset = 0.09,
> numexchg = 5,
> &end
>
> So at least I know that your are running the same way we are, and not
> getting error message.
> It is quite puzzling.
>
> Marcela
>
>
>
> > On Jul 2, 2019, at 11:56 AM, koushik kasavajhala <
> koushik.sbiiit.gmail.com> wrote:
> >
> > Hi Marcela,
> >
> > Our lab also uses a slurm queuing system. We use the below script, which
> is
> > similar to your script, to submit 2 replica REMD jobs to one node.
> >
> > #!/bin/bash
> > #SBATCH -N 1
> > #SBATCH --tasks-per-node 2
> > #SBATCH --gres=gpu:2
> >
> > mpirun -np 2 /opt/amber/bin/pmemd.cuda.MPI -O -ng 2 -groupfile
> groupremd
> >
> > So, I do not see anything wrong with your submission script. Since your
> CPU
> > jobs run fine, I think there might be some issue with the way the GPUs
> are
> > configured on your cluster. Note: CPU REMD jobs require 2 cpus per
> replica
> > whereas the GPU REMD jobs require only 1 gpu per replica.
> >
> > I just ran the test cases with 2 and 4 replicas, they all pass for me. If
> > you are having issues with the test cases, I think something might be
> wrong
> > with the way files are being sourced. I don't think it is a compiler
> issue
> > either. We use gnu compilers on our cluster and all tests pass for us.
> >
> > Can you run the test cases inside the directory that David Case pointed
> > out? There is a Run.rem.sh file inside
> AMBERHOME/test/cuda/remd/rem_2rep_gb
> > directory. Executing this file should not give the error message that
> input
> > files were not found. If this doesn't work, then can you post the error
> the
> > user had? They might have had a different error instead of input files
> not
> > being found.
> >
> > .David Case: I looked at the files in test/cuda/remd folder. rem_gb_2rep,
> > rem_gb_4rep, rem_wat are not used at all. Deleting those folders did not
> > affect any of the test cases; they all passed.
> >
> > Best,
> > Koushik
> > Carlos Simmerling Lab
> >
> >
> >
> > On Tue, Jul 2, 2019 at 9:57 AM Marcela Madrid <mmadrid.psc.edu> wrote:
> >
> >> hi Dave,
> >>
> >> thanks for your answer. It is not just a problem with the test examples.
> >> It is a problem whenever we try to run REMD on the GPUs on Bridges at
> the
> >> PSC.
> >> The reason why I am looking at it is a user wants to run it. REMD on the
> >> CPUs works fine (with the corresponding executable of course),
> >> it is just a problem with the GPUs. So it occurred to me to see if it
> >> passes the tests and we have the same error
> >> messages. The user has her input files in the directory where she runs.
> >>
> >> I think it is either a problem with the configuration of the GPU nodes
> on
> >> Bridges or a bug.
> >> Each Bridges node has 24 cores and 2 P100 GPUs. I have asked for 1 node,
> >> ntasks-per-node=2 and the 2 GPUs
> >> but I get the error message about not finding the input files.
> >> Amber on GPUs was compiled with
> >> ./configure -cuda -mpi gnu
> >> Attempting to compile with intel compilers instead of gnu gave error
> >> messages.
> >>
> >> O3 -ccbin icpc -o cuda_mg_wrapper.o -c cuda_mg_wrapper.cu
> >> In file included from
> >>
> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/host_config.h(50),
> >> from
> >>
> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/cuda_runtime.h(78),
> >> from cuda_mg_wrapper.cu(0):
> >>
> /opt/packages/cuda/9.2/bin/../targets/x86_64-linux/include/crt/host_config.h(79):
> >> error: #error directive: -- unsupported ICC configuration! Only
> >> ICC 15.0, ICC 16.0, and ICC 17.0 on Linux x86_64 are supported!
> >> #error -- unsupported ICC configuration! Only ICC 15.0, ICC 16.0, and
> >> ICC 17.0 on Linux x86_64 are supported!
> >>
> >> We do not have such old versions of the compilers. Any hints will be
> >> appreciated
> >> as to how to run REMD on the GPUS. Thanks so much,
> >>
> >> Marcela
> >>
> >>
> >>> On Jul 2, 2019, at 9:01 AM, David A Case <david.case.rutgers.edu>
> wrote:
> >>>
> >>> On Mon, Jul 01, 2019, Marcela Madrid wrote:
> >>>
> >>>>> Two replica GB REMD test.
> >>>>>
> >>>>>
> >>>>> Unit 5 Error on OPEN: rem.in.001
> >>>
> >>> OK: query for the REMD experts: in AMBERHOME/test/cuda/remd there are
> >>> two directories: rem_2rep_gb and rem_gb_2rep. The rem.in.00? files are
> >>> in the former, but the tests actually get run in the latter directory.
> >>>
> >>> Same general problem for rem_2rep_pme: the needed rem.in.00? files are
> >>> in rem_wat_2 (or maybe in rem_wat).
> >>>
> >>> I'm probably missing something here, but cleaning up (or at least
> >>> commenting) the cuda/remd test folder seems worthwhile: there are
> >>> folders that seem never to be used, and input files that seem to be in
> >>> the wrong place.
> >>>
> >>> Marcela: I'd ignore these failures for now; something should get posted
> >>> here that either fixes the problem, or figures out a problem with your
> >>> inputs. (My money is on the former.)
> >>>
> >>> ...dac
> >>>
> >>>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

application/gzip attachment: rem_2rep_gb.tar.gz

Received on Tue Jul 02 2019 - 13:30:02 PDT