I think it has something to do with the way the gpus are shared in a single
application. NEB also uses multiple input files. I am sure that the 4
replica REMD gpu jobs will also fail saying that 3 Input files
(rem.in.00[1-3] files) are not found.
I was looking at your website and found that the K80 nodes are better
designed for sharing gpus in a single application and hence, suggested
using it but that doesn’t seem to help.
Sorry, I am out of options here. Maybe others on the list can help.
Best,
Koushik
On Wed, Jul 3, 2019 at 5:38 PM Marcela Madrid <mmadrid.psc.edu> wrote:
> yes, i have tried, same error. But what is the solution?
> I am attaching the files again. The 4 PEs test also fails in
> Run.neb_gb_full
> I will look into that.
>
> Marcela
>
> > On Jul 3, 2019, at 5:21 PM, koushik kasavajhala <
> koushik.sbiiit.gmail.com> wrote:
> >
> > Ohh!! I see what the issue is. REMD jobs use multiple input files - one
> > file for each replica. In your case, it is always the second input file
> > (rem.in.001) that isn’t found. Have you tried it on the K80 nodes on your
> > cluster?
> >
> > On Wed, Jul 3, 2019 at 4:53 PM koushik kasavajhala <
> koushik.sbiiit.gmail.com>
> > wrote:
> >
> >> Sorry, I don’t see the attachments. Can you resend? The 2 GPU test log
> >> file (makecudatestmpi2.log?) should be sufficient.
> >>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jul 03 2019 - 15:00:02 PDT