Re: [AMBER] Failure kReduceSoluteCOM with GPU

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sat, 30 Jul 2011 13:15:22 -0700

Hi Fabricio,

Here's what I used as the command line:

$AMBERHOME/bin/pmemd -O -i md4.in -o md4.out -p micela.prmtop -c md2.3.rst
-x md4.mdcrd -r md4.rst -ref md2.3.rst

I can confirm that this works fine on the CPU but fails on the GPU with our
latest code so we will look into it.

All the best
Ross

> -----Original Message-----
> From: Scott Le Grand [mailto:varelse2005.gmail.com]
> Sent: Saturday, July 30, 2011 9:52 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
>
> First, try running this on CPU AMBER. It looks to me like it's broken
> there
> as well. This is beause your restraint energy starts off somewhere in
> the
> proximity of Neptune on the first iteration. What happens from there
> is
> dependent on whether you're on a GPU or CPU to some extent but it's
> separate
> but equally bad.
>
> To see what I mean, set ntpr=1 in your md.in file and compare step 1
> energies on both CPU and GPU.
>
> Also, I got a file from Ross to replicate this but your command line in
> this
> thread uses different file names than are in the archive. Could you
> send me
> your exact command line based on what you sent Ross?
>
> Scott
>
>
>
> 2011/7/29 Fabrício Bracht <bracht.iq.ufrj.br>
>
> > Hi Ross. Just cheking to see if you received my last email with the
> > file attached.
> > Thank you
> > Fabrício
> >
> > 2011/7/27 Ross Walker <ross.rosswalker.co.uk>:
> > > Hi Fabricio,
> > >
> > > If they are identical this means that this may be a new bug,
> although we
> > may
> > > have already inadvertently fixed it in the development version. Can
> you
> > send
> > > me your input files please (direct to me is fine) so I can try it
> here
> > and
> > > see if I can reproduce it.
> > >
> > > All the best
> > > Ross
> > >
> > >> -----Original Message-----
> > >> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
> > >> Sent: Wednesday, July 27, 2011 12:05 PM
> > >> To: AMBER Mailing List
> > >> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
> > >>
> > >> Hi Ross. Here is my result to md5sum *.
> > >> md5sum: B40C: Is a directory
> > >> f4ed79de194d836246009d5c29051574 cuda_info.fpp
> > >> a9e4f660fcb5347b1273a8e3f76d3e74 gpu.cpp
> > >> 307e64e078aa5f1f22bd78fd224c9f4b gpu.h
> > >> 9e6a4f93e46046cda29369feb0dd32e8 gputypes.cpp
> > >> 46f8ccf2bbee063ff35a73945b16a3a2 gputypes.h
> > >> 90ba8d068522a00074707a529469f5ea kCalculateGBBornRadii.cu
> > >> 97fbbcfb8a3833509d94072ecab05643 kCalculateGBNonbondEnergy1.cu
> > >> 79fb7a5bba2a19ba351a7dd5996d31fc kCalculateGBNonbondEnergy2.cu
> > >> 67a458e51a76162edbcc907e7135500c kCalculateLocalForces.cu
> > >> ce308f4fbe9468d5505beb0099d58e76 kCalculatePMENonbondEnergy.cu
> > >> 9b240d418e391a71b590e6dc3bc3b0ff kCCF.h
> > >> 5561a56bc236291cb87b4770453d67a4 kCLF.h
> > >> 86f220029e3a943a186ebcfd16e2dcd9 kCPNE.h
> > >> 9905ed2e705bccf1ae705279d85d0e57 kForcesUpdate.cu
> > >> edf2d74af7a4d401ccecc7bfa6d036c3 kNeighborList.cu
> > >> fd65d023597024a68565c5a0e5ffd86c kNTPKernels.h
> > >> 49f952b429618228fca8e23f44223c58 kPGGW.h
> > >> 4aea91b87cbb3cf62b9fddafe607ab48 kPGS.h
> > >> 9c5951cdf94402d2c0396b74498f72f5 kPMEInterpolation.cu
> > >> 46f01611524128ea428c069ef58bd421 kPSSE.h
> > >> ada7d510598c88ed4adb8d32a9dbf73d kRandom.h
> > >> eefe9bd32e04ba2bbe2eb5611a6464bd kShake.cu
> > >> b07e184d2840ffae27d8af5415fae04a kU.h
> > >> 6947e1fae477c0bb9c637062a0ddbfd8 Makefile
> > >> e5a6173273e6812669c21abcd1530226 Makefile.advanced
> > >> They are exactly the same. Now I really don´t know what to do.
> What do
> > >> you suggest?
> > >> Fabrício Bracht
> > >>
> > >> 2011/7/27 Ross Walker <ross.rosswalker.co.uk>:
> > >> > Hi Fabricio,
> > >> >
> > >> > Please take a look at the following which explains what md5sum's
> are:
> > >> > http://en.wikipedia.org/wiki/Md5sum
> > >> >
> > >> > In summary it creates an 'almost' unique fingerprint of a file.
> Thus
> > >> if I
> > >> > run md5sum on the files in my directory and you run md5sum on
> the
> > >> files in
> > >> > your directory one can compare the fingerprints produced. If
> they are
> > >> the
> > >> > same then we know the files are identical. The following is the
> list
> > >> of
> > >> > md5sum's for the files in my cuda directory which represents the
> > >> currently
> > >> > fully up to date released copy of AMBER with all bugfixes
> applied.
> > >> You
> > >> > should go to your machine and do the following:
> > >> >
> > >> > cd $AMBERHOME/src
> > >> > make clean
> > >> > cd pmemd/src/cuda
> > >> > md5sum *
> > >> >
> > >> > And then see if the fingerprint given (the bunch of letters and
> > >> numbers
> > >> > before each file) matches those I list below for each file. If
> they
> > >> do then
> > >> > we know your patch was all applied correctly and your system may
> be
> > >> > highlighting a real bug in the code. Note the GTX275 and
> GTX460's are
> > >> VERY
> > >> > different chip architectures hence why a subtle bug such as this
> may
> > >> only
> > >> > manifest itself on one card and not the other.
> > >> >
> > >> > All the best
> > >> > Ross
> > >> >
> > >> > foo.linux-jh9j:~/amber11_as_of_jul_22/src/pmemd/src/cuda> md5sum
> *
> > >> > md5sum: B40C: Is a directory
> > >> > f4ed79de194d836246009d5c29051574 cuda_info.fpp
> > >> > a9e4f660fcb5347b1273a8e3f76d3e74 gpu.cpp
> > >> > 307e64e078aa5f1f22bd78fd224c9f4b gpu.h
> > >> > 9e6a4f93e46046cda29369feb0dd32e8 gputypes.cpp
> > >> > 46f8ccf2bbee063ff35a73945b16a3a2 gputypes.h
> > >> > 90ba8d068522a00074707a529469f5ea kCalculateGBBornRadii.cu
> > >> > 97fbbcfb8a3833509d94072ecab05643 kCalculateGBNonbondEnergy1.cu
> > >> > 79fb7a5bba2a19ba351a7dd5996d31fc kCalculateGBNonbondEnergy2.cu
> > >> > 67a458e51a76162edbcc907e7135500c kCalculateLocalForces.cu
> > >> > ce308f4fbe9468d5505beb0099d58e76 kCalculatePMENonbondEnergy.cu
> > >> > 9b240d418e391a71b590e6dc3bc3b0ff kCCF.h
> > >> > 5561a56bc236291cb87b4770453d67a4 kCLF.h
> > >> > 86f220029e3a943a186ebcfd16e2dcd9 kCPNE.h
> > >> > 9905ed2e705bccf1ae705279d85d0e57 kForcesUpdate.cu
> > >> > edf2d74af7a4d401ccecc7bfa6d036c3 kNeighborList.cu
> > >> > fd65d023597024a68565c5a0e5ffd86c kNTPKernels.h
> > >> > 49f952b429618228fca8e23f44223c58 kPGGW.h
> > >> > 4aea91b87cbb3cf62b9fddafe607ab48 kPGS.h
> > >> > 9c5951cdf94402d2c0396b74498f72f5 kPMEInterpolation.cu
> > >> > 46f01611524128ea428c069ef58bd421 kPSSE.h
> > >> > ada7d510598c88ed4adb8d32a9dbf73d kRandom.h
> > >> > eefe9bd32e04ba2bbe2eb5611a6464bd kShake.cu
> > >> > b07e184d2840ffae27d8af5415fae04a kU.h
> > >> > 6947e1fae477c0bb9c637062a0ddbfd8 Makefile
> > >> > e5a6173273e6812669c21abcd1530226 Makefile.advanced
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
> > >> >> Sent: Wednesday, July 27, 2011 8:53 AM
> > >> >> To: AMBER Mailing List; Scott Brozell
> > >> >> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
> > >> >>
> > >> >> Hi,
> > >> >> I've only found $AMBERHOME/AmberTools/src/configure.rej .
> > >> >> I've checked the files that were supposed to be patched by
> > >> bugfix.11,
> > >> >> but wasn't able to confirm if they were patched or not due to
> my
> > >> lack
> > >> >> of programming knowledge. Any tips here?
> > >> >> One other thing. Why is it that this simulation ran
> successfully on
> > >> my
> > >> >> GTX275 computer but has problems with my GTX460?
> > >> >> Thank you
> > >> >> Fabrício
> > >> >>
> > >> >> 2011/7/27 Scott Brozell <sbrozell.rci.rutgers.edu>:
> > >> >> > Hi,
> > >> >> >
> > >> >> > The patch command should create a reject file: blabla.rej.
> > >> >> > So look for files with a rej extension.
> > >> >> > Also since in bugfix 11 there are only a few files to be
> patched
> > >> in
> > >> >> > src/pmemd/src/cuda, you could look at those files to see if
> the
> > >> >> > patch has been applied:
> > >> >> > http://ambermd.org/bugfixes/11.0/bugfix.11
> > >> >> >
> > >> >> > scott
> > >> >> >
> > >> >> > On Tue, Jul 26, 2011 at 10:07:28AM -0300, Fabrício Bracht
> wrote:
> > >> >> >> Hi Scott. How do I check if this specific bugfix has been
> applied
> > >> >> >> correctly? Would it be something like md5sum * in
> > >> >> >> $AMBERHOME/src/pmemd/src/cuda/ . And what should I look for?
> > >> >> >> Thank you
> > >> >> >> Fabrício
> > >> >> >>
> > >> >> >> 2011/7/26 Scott Brozell <sbrozell.rci.rutgers.edu>:
> > >> >> >> > Hi,
> > >> >> >> >
> > >> >> >> > This looks like a problem addressed by bugfix.11.
> > >> >> >> > I have not been following your threads closely,
> > >> >> >> > but i read that you were having problems with the
> bugfixes.
> > >> >> >> > You might inspect the files listed in bugfix.11 to
> determine
> > >> >> >> > whether the bugfixes were really applied, while you are
> waiting
> > >> >> >> > for someone that as been following your threads closely to
> > >> reply.
> > >> >> >> >
> > >> >> >> > scott
> > >> >> >> >
> > >> >> >> > On Tue, Jul 26, 2011 at 12:44:10AM -0300, Fabrício Bracht
> > >> wrote:
> > >> >> >> >> Since I finally was able to compile amber11 with cuda
> support
> > >> on
> > >> >> my
> > >> >> >> >> for my gtx460, I thought everything was fine, but it
> seems
> > >> that
> > >> >> now I
> > >> >> >> >> have to set a few things in order to get my system
> running
> > >> again.
> > >> >> Let
> > >> >> >> >> me explain more.
> > >> >> >> >> I was simulating a protein inside a micele. I had a few
> tens
> > >> of
> > >> >> >> >> nanoseconds simulated on a gtx275. The system is
> comprised of
> > >> >> water,
> > >> >> >> >> organic solvent, surfactant, counterions and my protein
> > >> (aprox.
> > >> >> 60000
> > >> >> >> >> atoms). When I tried to start a simulation using my
> restart
> > >> files
> > >> >> from
> > >> >> >> >> the GTX275 on my gtx460 machine, I got the following
> error.
> > >> >> >> >> Error: unspecified launch failure launching kernel
> > >> >> kReduceSoluteCOM
> > >> >> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
> > >> failure
> > >> >> >> >>
> > >> >> >> >> I thought it might have something to do with a problem in
> the
> > >> >> restart
> > >> >> >> >> file or something like this, so I recreated the inpcrd
> and
> > >> prmtop
> > >> >> >> >> files for the last configuration and tried to start a new
> > >> fresh
> > >> >> one in
> > >> >> >> >> my gtx460 machine. Well, it didn't work out. I got the
> same
> > >> error
> > >> >> >> >> lines again.
> > >> >> >> >> Here is my configuration file.
> > >> >> >> >> MD parameters
> > >> >> >> >> &cntrl
> > >> >> >> >> imin = 0,
> > >> >> >> >> irest = 1,
> > >> >> >> >> ntx = 7,
> > >> >> >> >> ntb = 2, pres0 = 1.0, ntp = 1, taup = 2.0,
> > >> >> >> >> cut = 9.0,
> > >> >> >> >> ntr = 1,
> > >> >> >> >> ntc = 2,
> > >> >> >> >> ntf = 2,
> > >> >> >> >> tempi = 300.0,
> > >> >> >> >> temp0 = 300.0,
> > >> >> >> >> ntt = 3,
> > >> >> >> >> gamma_ln = 1.0,
> > >> >> >> >> nstlim = 5000000, dt = 0.002,
> > >> >> >> >> ntpr = 10000, ntwx = 10000, ntwr = 1000
> > >> >> >> >> /
> > >> >> >> >> Restraints
> > >> >> >> >> 5.0
> > >> >> >> >> RES 1 317
> > >> >> >> >> END
> > >> >> >> >> END
> > >> >> >> >>
> > >> >> >> >> And here is the command line:
> > >> >> >> >> pmemd.cuda -O -i md.in -c micel2.3.inpcrd -p
> micel2.3.prmtop -
> > >> r
> > >> >> >> >> md3.rst -o md3.out -ref micela2.3.inpcrd -inf md3.info -x
> > >> >> md3.mdcrd
> > >> >> >> >
> > >> >> >
> > >> >> > _______________________________________________
> > >> >> > AMBER mailing list
> > >> >> > AMBER.ambermd.org
> > >> >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >> >> >
> > >> >>
> > >> >> _______________________________________________
> > >> >> AMBER mailing list
> > >> >> AMBER.ambermd.org
> > >> >> http://lists.ambermd.org/mailman/listinfo/amber
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >> >
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Jul 30 2011 - 13:30:03 PDT
Custom Search