Re: [AMBER] Failure kReduceSoluteCOM with GPU

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 27 Jul 2011 14:57:12 -0700

Hi Fabricio,

If they are identical this means that this may be a new bug, although we may
have already inadvertently fixed it in the development version. Can you send
me your input files please (direct to me is fine) so I can try it here and
see if I can reproduce it.

All the best
Ross

> -----Original Message-----
> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
> Sent: Wednesday, July 27, 2011 12:05 PM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
>
> Hi Ross. Here is my result to md5sum *.
> md5sum: B40C: Is a directory
> f4ed79de194d836246009d5c29051574 cuda_info.fpp
> a9e4f660fcb5347b1273a8e3f76d3e74 gpu.cpp
> 307e64e078aa5f1f22bd78fd224c9f4b gpu.h
> 9e6a4f93e46046cda29369feb0dd32e8 gputypes.cpp
> 46f8ccf2bbee063ff35a73945b16a3a2 gputypes.h
> 90ba8d068522a00074707a529469f5ea kCalculateGBBornRadii.cu
> 97fbbcfb8a3833509d94072ecab05643 kCalculateGBNonbondEnergy1.cu
> 79fb7a5bba2a19ba351a7dd5996d31fc kCalculateGBNonbondEnergy2.cu
> 67a458e51a76162edbcc907e7135500c kCalculateLocalForces.cu
> ce308f4fbe9468d5505beb0099d58e76 kCalculatePMENonbondEnergy.cu
> 9b240d418e391a71b590e6dc3bc3b0ff kCCF.h
> 5561a56bc236291cb87b4770453d67a4 kCLF.h
> 86f220029e3a943a186ebcfd16e2dcd9 kCPNE.h
> 9905ed2e705bccf1ae705279d85d0e57 kForcesUpdate.cu
> edf2d74af7a4d401ccecc7bfa6d036c3 kNeighborList.cu
> fd65d023597024a68565c5a0e5ffd86c kNTPKernels.h
> 49f952b429618228fca8e23f44223c58 kPGGW.h
> 4aea91b87cbb3cf62b9fddafe607ab48 kPGS.h
> 9c5951cdf94402d2c0396b74498f72f5 kPMEInterpolation.cu
> 46f01611524128ea428c069ef58bd421 kPSSE.h
> ada7d510598c88ed4adb8d32a9dbf73d kRandom.h
> eefe9bd32e04ba2bbe2eb5611a6464bd kShake.cu
> b07e184d2840ffae27d8af5415fae04a kU.h
> 6947e1fae477c0bb9c637062a0ddbfd8 Makefile
> e5a6173273e6812669c21abcd1530226 Makefile.advanced
> They are exactly the same. Now I really don´t know what to do. What do
> you suggest?
> Fabrício Bracht
>
> 2011/7/27 Ross Walker <ross.rosswalker.co.uk>:
> > Hi Fabricio,
> >
> > Please take a look at the following which explains what md5sum's are:
> > http://en.wikipedia.org/wiki/Md5sum
> >
> > In summary it creates an 'almost' unique fingerprint of a file. Thus
> if I
> > run md5sum on the files in my directory and you run md5sum on the
> files in
> > your directory one can compare the fingerprints produced. If they are
> the
> > same then we know the files are identical. The following is the list
> of
> > md5sum's for the files in my cuda directory which represents the
> currently
> > fully up to date released copy of AMBER with all bugfixes applied.
> You
> > should go to your machine and do the following:
> >
> > cd $AMBERHOME/src
> > make clean
> > cd pmemd/src/cuda
> > md5sum *
> >
> > And then see if the fingerprint given (the bunch of letters and
> numbers
> > before each file) matches those I list below for each file. If they
> do then
> > we know your patch was all applied correctly and your system may be
> > highlighting a real bug in the code. Note the GTX275 and GTX460's are
> VERY
> > different chip architectures hence why a subtle bug such as this may
> only
> > manifest itself on one card and not the other.
> >
> > All the best
> > Ross
> >
> > foo.linux-jh9j:~/amber11_as_of_jul_22/src/pmemd/src/cuda> md5sum *
> > md5sum: B40C: Is a directory
> > f4ed79de194d836246009d5c29051574  cuda_info.fpp
> > a9e4f660fcb5347b1273a8e3f76d3e74  gpu.cpp
> > 307e64e078aa5f1f22bd78fd224c9f4b  gpu.h
> > 9e6a4f93e46046cda29369feb0dd32e8  gputypes.cpp
> > 46f8ccf2bbee063ff35a73945b16a3a2  gputypes.h
> > 90ba8d068522a00074707a529469f5ea  kCalculateGBBornRadii.cu
> > 97fbbcfb8a3833509d94072ecab05643  kCalculateGBNonbondEnergy1.cu
> > 79fb7a5bba2a19ba351a7dd5996d31fc  kCalculateGBNonbondEnergy2.cu
> > 67a458e51a76162edbcc907e7135500c  kCalculateLocalForces.cu
> > ce308f4fbe9468d5505beb0099d58e76  kCalculatePMENonbondEnergy.cu
> > 9b240d418e391a71b590e6dc3bc3b0ff  kCCF.h
> > 5561a56bc236291cb87b4770453d67a4  kCLF.h
> > 86f220029e3a943a186ebcfd16e2dcd9  kCPNE.h
> > 9905ed2e705bccf1ae705279d85d0e57  kForcesUpdate.cu
> > edf2d74af7a4d401ccecc7bfa6d036c3  kNeighborList.cu
> > fd65d023597024a68565c5a0e5ffd86c  kNTPKernels.h
> > 49f952b429618228fca8e23f44223c58  kPGGW.h
> > 4aea91b87cbb3cf62b9fddafe607ab48  kPGS.h
> > 9c5951cdf94402d2c0396b74498f72f5  kPMEInterpolation.cu
> > 46f01611524128ea428c069ef58bd421  kPSSE.h
> > ada7d510598c88ed4adb8d32a9dbf73d  kRandom.h
> > eefe9bd32e04ba2bbe2eb5611a6464bd  kShake.cu
> > b07e184d2840ffae27d8af5415fae04a  kU.h
> > 6947e1fae477c0bb9c637062a0ddbfd8  Makefile
> > e5a6173273e6812669c21abcd1530226  Makefile.advanced
> >
> >> -----Original Message-----
> >> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
> >> Sent: Wednesday, July 27, 2011 8:53 AM
> >> To: AMBER Mailing List; Scott Brozell
> >> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
> >>
> >> Hi,
> >> I've only found $AMBERHOME/AmberTools/src/configure.rej .
> >> I've checked the files that were supposed to be patched by
> bugfix.11,
> >> but wasn't able to confirm if they were patched or not due to my
> lack
> >> of programming knowledge. Any tips here?
> >> One other thing. Why is it that this simulation ran successfully on
> my
> >> GTX275 computer but has problems with my GTX460?
> >> Thank you
> >> Fabrício
> >>
> >> 2011/7/27 Scott Brozell <sbrozell.rci.rutgers.edu>:
> >> > Hi,
> >> >
> >> > The patch command should create a reject file: blabla.rej.
> >> > So look for files with a rej extension.
> >> > Also since in bugfix 11 there are only a few files to be patched
> in
> >> > src/pmemd/src/cuda, you could look at those files to see if the
> >> > patch has been applied:
> >> > http://ambermd.org/bugfixes/11.0/bugfix.11
> >> >
> >> > scott
> >> >
> >> > On Tue, Jul 26, 2011 at 10:07:28AM -0300, Fabrício Bracht wrote:
> >> >> Hi Scott. How do I check if this specific bugfix has been applied
> >> >> correctly? Would it be something like md5sum * in
> >> >> $AMBERHOME/src/pmemd/src/cuda/ . And what should I look for?
> >> >> Thank you
> >> >> Fabrício
> >> >>
> >> >> 2011/7/26 Scott Brozell <sbrozell.rci.rutgers.edu>:
> >> >> > Hi,
> >> >> >
> >> >> > This looks like a problem addressed by bugfix.11.
> >> >> > I have not been following your threads closely,
> >> >> > but i read that you were having problems with the bugfixes.
> >> >> > You might inspect the files listed in bugfix.11 to determine
> >> >> > whether the bugfixes were really applied, while you are waiting
> >> >> > for someone that as been following your threads closely to
> reply.
> >> >> >
> >> >> > scott
> >> >> >
> >> >> > On Tue, Jul 26, 2011 at 12:44:10AM -0300, Fabrício Bracht
> wrote:
> >> >> >> Since I finally was able to compile amber11 with cuda support
> on
> >> my
> >> >> >> for my gtx460, I thought everything was fine, but it seems
> that
> >> now I
> >> >> >> have to set a few things in order to get my system running
> again.
> >> Let
> >> >> >> me explain more.
> >> >> >> I was simulating a protein inside a micele. I had a few tens
> of
> >> >> >> nanoseconds simulated on a gtx275. The system is comprised of
> >> water,
> >> >> >> organic solvent, surfactant, counterions and my protein
> (aprox.
> >> 60000
> >> >> >> atoms). When I tried to start a simulation using my restart
> files
> >> from
> >> >> >> the GTX275 on my gtx460 machine, I got the following error.
> >> >> >> Error: unspecified launch failure launching kernel
> >> kReduceSoluteCOM
> >> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
> failure
> >> >> >>
> >> >> >> I thought it might have something to do with a problem in the
> >> restart
> >> >> >> file or something like this, so I recreated the inpcrd and
> prmtop
> >> >> >> files for the last configuration and tried to start a new
> fresh
> >> one in
> >> >> >> my gtx460 machine. Well, it didn't work out. I got the same
> error
> >> >> >> lines again.
> >> >> >> Here is my configuration file.
> >> >> >> MD parameters
> >> >> >>  &cntrl
> >> >> >>   imin   = 0,
> >> >> >>   irest  = 1,
> >> >> >>   ntx    = 7,
> >> >> >>   ntb    = 2, pres0 = 1.0, ntp = 1, taup = 2.0,
> >> >> >>   cut    = 9.0,
> >> >> >>   ntr    = 1,
> >> >> >>   ntc    = 2,
> >> >> >>   ntf    = 2,
> >> >> >>   tempi  = 300.0,
> >> >> >>   temp0  = 300.0,
> >> >> >>   ntt    = 3,
> >> >> >>   gamma_ln = 1.0,
> >> >> >>   nstlim = 5000000, dt = 0.002,
> >> >> >>   ntpr = 10000, ntwx = 10000, ntwr = 1000
> >> >> >>  /
> >> >> >> Restraints
> >> >> >> 5.0
> >> >> >> RES 1 317
> >> >> >> END
> >> >> >> END
> >> >> >>
> >> >> >> And here is the command line:
> >> >> >> pmemd.cuda -O -i md.in -c micel2.3.inpcrd -p micel2.3.prmtop -
> r
> >> >> >> md3.rst -o md3.out -ref micela2.3.inpcrd -inf md3.info -x
> >> md3.mdcrd
> >> >> >
> >> >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >> >
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jul 27 2011 - 15:00:04 PDT
Custom Search