Re: [AMBER] cudaMemcpy GpuBuffer ERROR

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Mon, 19 Jul 2021 11:03:42 -0400

that's interesting that it sounds like it is the GPU and not the system
setup itself. Do other simulations of similar size work ok on the Titan V?

On Mon, Jul 19, 2021 at 11:00 AM Gerardo Zerbetto De Palma <
g.zerbetto.gmail.com> wrote:

> Hi Carlos. Thanks for the help. The system we are trying to simulate is a
> nPT membrane embedded protein tetramer. We are just running a plain MD sim,
> so no additional parameters are set. The initial coordinates were obtained
> from a previous simulation that we had run with the same parameters, so
> those coordinates are from a very well thermalized system. We just made a
> single point mutation and made a minimization just to relax any clashes. It
> is quite remarkable that the same system is being simulated in an RTX2080
> (in another computer) without any trouble but when we run it in the TITAN V
> it randomly crashes. Now we will try swapping the GPUs just to discard that
> it is not a problem with any other computer component.
> Thanks a lot, again.
> Regards
>
> The original post was this one:
> *Hi everyone.*
> *We were trying to run some simulations of a membrane protein on an NVIDIA
> TITAN V and got stuck by some cudaMemcpy that came in different flavors:*
>
>
>
> *cudaMemcpy GpuBuffer::Upload failed unspecified launch failurecudaMemcpy
> GpuBuffer::Download failed unspecified launch failure*
>
> *cudaMemcpy GpuBuffer::Download failed an illegal instruction was
> encountered*
>
> *Firstly we started running the sim using amber 18, restarting the sim
> every 5 nanoseconds to get consecutive 5ns trajectories. After simulating
> 25 nanoseconds, the program stopped randomly. Then we tried to repeat the
> simulation that had failed (using the same random seed and initial
> coordinates) and the simulation succeeded, but the same error came up in a
> subsequent simulation. These errors kept coming at a random timestep when
> we restarted the simulations. Energies in the output seemed to be OK and
> simulations sometimes proceeded without errors when restarted. Hoping that
> this was a bug, we compiled amber 20 and ran the same simulations and had
> the same random cudaMemcpy errors. Just to check if the simulated system
> was fine, we are also running it in a RTX2080 with amber 18 without
> problems, so far.*
>
> *We are running out of ideas here so here we are reaching out to the
> community for some help in this matter. We will appreciate every idea or
> question that can enlighten us to solve this puzzle.*
>
> El lun, 19 jul 2021 a las 11:08, Carlos Simmerling (<
> carlos.simmerling.gmail.com>) escribió:
>
> > for ff19SB problems, make sure your Amber version is completely updated
> > with current patches. There was a fix a while back that corrected an
> error
> > that could lead to failures with some force fields including ff19SB.
> > Information on applying patches is found here:
> > http://ambermd.org/AmberPatches.php
> >
> > for the problem with ff14SB, I did not see the original post. More
> details
> > would be helpful, especially about the system you are simulating (is it
> > only protein, or more? did you use any other parameters except ff14SB?
> > Where were the initial coordinates obtained?).
> >
> >
> >
> >
> > On Mon, Jul 19, 2021 at 9:46 AM Gerardo Zerbetto De Palma <
> > g.zerbetto.gmail.com> wrote:
> >
> > > Hi we are using ff14SB forcefield and the errors still appear.
> > > Thanks for the help!
> > > Regards
> > >
> > > Gerardo Zerbetto
> > >
> > > <
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > >
> > > Virus-free.
> > > www.avg.com
> > > <
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > >
> > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > >
> > > El vie, 16 jul 2021 a las 18:29, Rafał Madaj (<rmadaj.cbmm.lodz.pl>)
> > > escribió:
> > >
> > > > Hi,
> > > >
> > > > Which force field are you using? I had exactly same problem with
> > ff19SB.
> > > > After changing into ff14SB the problem disappeared.
> > > >
> > > > Regards,
> > > >
> > > > Rafal
> > > >
> > > > On 16.07.2021 18:01, Gerardo Zerbetto De Palma wrote:
> > > > > Hi everyone.
> > > > > We were trying to run some simulations of a membrane protein on an
> > > NVIDIA
> > > > > TITAN V and got stuck by some cudaMemcpy that came in different
> > > flavors:
> > > > >
> > > > > cudaMemcpy GpuBuffer::Upload failed unspecified launch failure
> > > > > cudaMemcpy GpuBuffer::Download failed unspecified launch failure
> > > > > cudaMemcpy GpuBuffer::Download failed an illegal instruction was
> > > > encountered
> > > > >
> > > > > Firstly we started running the sim using amber 18, restarting the
> sim
> > > > every
> > > > > 5 nanoseconds to get consecutive 5ns trajectories. After simulating
> > 25
> > > > > nanoseconds, the program stopped randomly. Then we tried to repeat
> > the
> > > > > simulation that had failed (using the same random seed and initial
> > > > > coordinates) and the simulation succeeded, but the same error came
> up
> > > in
> > > > a
> > > > > subsequent simulation. These errors kept coming at a random
> timestep
> > > when
> > > > > we restarted the simulations. Energies in the output seemed to be
> OK
> > > and
> > > > > simulations sometimes proceeded without errors when restarted.
> Hoping
> > > > that
> > > > > this was a bug, we compiled amber 20 and ran the same simulations
> and
> > > had
> > > > > the same random cudaMemcpy errors. Just to check if the simulated
> > > system
> > > > > was fine, we are also running it in a RTX2080 with amber 18 without
> > > > > problems, so far.
> > > > >
> > > > > We are running out of ideas here so here we are reaching out to the
> > > > > community for some help in this matter. We will appreciate every
> idea
> > > or
> > > > > question that can enlighten us to solve this puzzle.
> > > > >
> > > > > Regards!
> > > > > Gerardo Zerbetto De Palma
> > > > >
> > > > > <
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > >
> > > > > Virus-free.
> > > > > www.avg.com
> > > > > <
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > >
> > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > > > _______________________________________________
> > > > > AMBER mailing list
> > > > > AMBER.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 19 2021 - 08:30:02 PDT
Custom Search