Re: [AMBER] cudaMemcpy GpuBuffer ERROR

From: Gerardo Zerbetto De Palma <g.zerbetto.gmail.com>
Date: Mon, 19 Jul 2021 12:13:11 -0300

We tried older simulations that we had run on GTX1080 and RTX2080 and they
all show the same problem. Another remarkable thing is that we had run
similar simulations in the TITAN V without any trouble, and this type of
crash started to appear more often. Additionally, when the sim crashes,
also the whole computer crashes and we have to force reboot it. That is why
we will try to swap the GPUs.
Thanks a lot, Carlos!´
Regards

El lun, 19 jul 2021 a las 12:04, Carlos Simmerling (<
carlos.simmerling.gmail.com>) escribió:

> that's interesting that it sounds like it is the GPU and not the system
> setup itself. Do other simulations of similar size work ok on the Titan V?
>
> On Mon, Jul 19, 2021 at 11:00 AM Gerardo Zerbetto De Palma <
> g.zerbetto.gmail.com> wrote:
>
> > Hi Carlos. Thanks for the help. The system we are trying to simulate is a
> > nPT membrane embedded protein tetramer. We are just running a plain MD
> sim,
> > so no additional parameters are set. The initial coordinates were
> obtained
> > from a previous simulation that we had run with the same parameters, so
> > those coordinates are from a very well thermalized system. We just made a
> > single point mutation and made a minimization just to relax any clashes.
> It
> > is quite remarkable that the same system is being simulated in an RTX2080
> > (in another computer) without any trouble but when we run it in the
> TITAN V
> > it randomly crashes. Now we will try swapping the GPUs just to discard
> that
> > it is not a problem with any other computer component.
> > Thanks a lot, again.
> > Regards
> >
> > The original post was this one:
> > *Hi everyone.*
> > *We were trying to run some simulations of a membrane protein on an
> NVIDIA
> > TITAN V and got stuck by some cudaMemcpy that came in different flavors:*
> >
> >
> >
> > *cudaMemcpy GpuBuffer::Upload failed unspecified launch failurecudaMemcpy
> > GpuBuffer::Download failed unspecified launch failure*
> >
> > *cudaMemcpy GpuBuffer::Download failed an illegal instruction was
> > encountered*
> >
> > *Firstly we started running the sim using amber 18, restarting the sim
> > every 5 nanoseconds to get consecutive 5ns trajectories. After simulating
> > 25 nanoseconds, the program stopped randomly. Then we tried to repeat the
> > simulation that had failed (using the same random seed and initial
> > coordinates) and the simulation succeeded, but the same error came up in
> a
> > subsequent simulation. These errors kept coming at a random timestep when
> > we restarted the simulations. Energies in the output seemed to be OK and
> > simulations sometimes proceeded without errors when restarted. Hoping
> that
> > this was a bug, we compiled amber 20 and ran the same simulations and had
> > the same random cudaMemcpy errors. Just to check if the simulated system
> > was fine, we are also running it in a RTX2080 with amber 18 without
> > problems, so far.*
> >
> > *We are running out of ideas here so here we are reaching out to the
> > community for some help in this matter. We will appreciate every idea or
> > question that can enlighten us to solve this puzzle.*
> >
> > El lun, 19 jul 2021 a las 11:08, Carlos Simmerling (<
> > carlos.simmerling.gmail.com>) escribió:
> >
> > > for ff19SB problems, make sure your Amber version is completely updated
> > > with current patches. There was a fix a while back that corrected an
> > error
> > > that could lead to failures with some force fields including ff19SB.
> > > Information on applying patches is found here:
> > > http://ambermd.org/AmberPatches.php
> > >
> > > for the problem with ff14SB, I did not see the original post. More
> > details
> > > would be helpful, especially about the system you are simulating (is it
> > > only protein, or more? did you use any other parameters except ff14SB?
> > > Where were the initial coordinates obtained?).
> > >
> > >
> > >
> > >
> > > On Mon, Jul 19, 2021 at 9:46 AM Gerardo Zerbetto De Palma <
> > > g.zerbetto.gmail.com> wrote:
> > >
> > > > Hi we are using ff14SB forcefield and the errors still appear.
> > > > Thanks for the help!
> > > > Regards
> > > >
> > > > Gerardo Zerbetto
> > > >
> > > > <
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > >
> > > > Virus-free.
> > > > www.avg.com
> > > > <
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > >
> > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > >
> > > > El vie, 16 jul 2021 a las 18:29, Rafał Madaj (<rmadaj.cbmm.lodz.pl>)
> > > > escribió:
> > > >
> > > > > Hi,
> > > > >
> > > > > Which force field are you using? I had exactly same problem with
> > > ff19SB.
> > > > > After changing into ff14SB the problem disappeared.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rafal
> > > > >
> > > > > On 16.07.2021 18:01, Gerardo Zerbetto De Palma wrote:
> > > > > > Hi everyone.
> > > > > > We were trying to run some simulations of a membrane protein on
> an
> > > > NVIDIA
> > > > > > TITAN V and got stuck by some cudaMemcpy that came in different
> > > > flavors:
> > > > > >
> > > > > > cudaMemcpy GpuBuffer::Upload failed unspecified launch failure
> > > > > > cudaMemcpy GpuBuffer::Download failed unspecified launch failure
> > > > > > cudaMemcpy GpuBuffer::Download failed an illegal instruction was
> > > > > encountered
> > > > > >
> > > > > > Firstly we started running the sim using amber 18, restarting the
> > sim
> > > > > every
> > > > > > 5 nanoseconds to get consecutive 5ns trajectories. After
> simulating
> > > 25
> > > > > > nanoseconds, the program stopped randomly. Then we tried to
> repeat
> > > the
> > > > > > simulation that had failed (using the same random seed and
> initial
> > > > > > coordinates) and the simulation succeeded, but the same error
> came
> > up
> > > > in
> > > > > a
> > > > > > subsequent simulation. These errors kept coming at a random
> > timestep
> > > > when
> > > > > > we restarted the simulations. Energies in the output seemed to be
> > OK
> > > > and
> > > > > > simulations sometimes proceeded without errors when restarted.
> > Hoping
> > > > > that
> > > > > > this was a bug, we compiled amber 20 and ran the same simulations
> > and
> > > > had
> > > > > > the same random cudaMemcpy errors. Just to check if the simulated
> > > > system
> > > > > > was fine, we are also running it in a RTX2080 with amber 18
> without
> > > > > > problems, so far.
> > > > > >
> > > > > > We are running out of ideas here so here we are reaching out to
> the
> > > > > > community for some help in this matter. We will appreciate every
> > idea
> > > > or
> > > > > > question that can enlighten us to solve this puzzle.
> > > > > >
> > > > > > Regards!
> > > > > > Gerardo Zerbetto De Palma
> > > > > >
> > > > > > <
> > > > >
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > > >
> > > > > > Virus-free.
> > > > > > www.avg.com
> > > > > > <
> > > > >
> > > >
> > >
> >
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > > > > >
> > > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > > > > _______________________________________________
> > > > > > AMBER mailing list
> > > > > > AMBER.ambermd.org
> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > > >
> > > > > _______________________________________________
> > > > > AMBER mailing list
> > > > > AMBER.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 19 2021 - 08:30:03 PDT
Custom Search