Re: [AMBER] Failure kReduceSoluteCOM with GPU

From: Fabrício Bracht <bracht.iq.ufrj.br>
Date: Fri, 29 Jul 2011 14:42:33 -0300

Hi Ross. Just cheking to see if you received my last email with the
file attached.
Thank you
Fabrício

2011/7/27 Ross Walker <ross.rosswalker.co.uk>:
> Hi Fabricio,
>
> If they are identical this means that this may be a new bug, although we may
> have already inadvertently fixed it in the development version. Can you send
> me your input files please (direct to me is fine) so I can try it here and
> see if I can reproduce it.
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
>> Sent: Wednesday, July 27, 2011 12:05 PM
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
>>
>> Hi Ross. Here is my result to md5sum *.
>> md5sum: B40C: Is a directory
>> f4ed79de194d836246009d5c29051574  cuda_info.fpp
>> a9e4f660fcb5347b1273a8e3f76d3e74  gpu.cpp
>> 307e64e078aa5f1f22bd78fd224c9f4b  gpu.h
>> 9e6a4f93e46046cda29369feb0dd32e8  gputypes.cpp
>> 46f8ccf2bbee063ff35a73945b16a3a2  gputypes.h
>> 90ba8d068522a00074707a529469f5ea  kCalculateGBBornRadii.cu
>> 97fbbcfb8a3833509d94072ecab05643  kCalculateGBNonbondEnergy1.cu
>> 79fb7a5bba2a19ba351a7dd5996d31fc  kCalculateGBNonbondEnergy2.cu
>> 67a458e51a76162edbcc907e7135500c  kCalculateLocalForces.cu
>> ce308f4fbe9468d5505beb0099d58e76  kCalculatePMENonbondEnergy.cu
>> 9b240d418e391a71b590e6dc3bc3b0ff  kCCF.h
>> 5561a56bc236291cb87b4770453d67a4  kCLF.h
>> 86f220029e3a943a186ebcfd16e2dcd9  kCPNE.h
>> 9905ed2e705bccf1ae705279d85d0e57  kForcesUpdate.cu
>> edf2d74af7a4d401ccecc7bfa6d036c3  kNeighborList.cu
>> fd65d023597024a68565c5a0e5ffd86c  kNTPKernels.h
>> 49f952b429618228fca8e23f44223c58  kPGGW.h
>> 4aea91b87cbb3cf62b9fddafe607ab48  kPGS.h
>> 9c5951cdf94402d2c0396b74498f72f5  kPMEInterpolation.cu
>> 46f01611524128ea428c069ef58bd421  kPSSE.h
>> ada7d510598c88ed4adb8d32a9dbf73d  kRandom.h
>> eefe9bd32e04ba2bbe2eb5611a6464bd  kShake.cu
>> b07e184d2840ffae27d8af5415fae04a  kU.h
>> 6947e1fae477c0bb9c637062a0ddbfd8  Makefile
>> e5a6173273e6812669c21abcd1530226  Makefile.advanced
>> They are exactly the same. Now I really don´t know what to do. What do
>> you suggest?
>> Fabrício Bracht
>>
>> 2011/7/27 Ross Walker <ross.rosswalker.co.uk>:
>> > Hi Fabricio,
>> >
>> > Please take a look at the following which explains what md5sum's are:
>> > http://en.wikipedia.org/wiki/Md5sum
>> >
>> > In summary it creates an 'almost' unique fingerprint of a file. Thus
>> if I
>> > run md5sum on the files in my directory and you run md5sum on the
>> files in
>> > your directory one can compare the fingerprints produced. If they are
>> the
>> > same then we know the files are identical. The following is the list
>> of
>> > md5sum's for the files in my cuda directory which represents the
>> currently
>> > fully up to date released copy of AMBER with all bugfixes applied.
>> You
>> > should go to your machine and do the following:
>> >
>> > cd $AMBERHOME/src
>> > make clean
>> > cd pmemd/src/cuda
>> > md5sum *
>> >
>> > And then see if the fingerprint given (the bunch of letters and
>> numbers
>> > before each file) matches those I list below for each file. If they
>> do then
>> > we know your patch was all applied correctly and your system may be
>> > highlighting a real bug in the code. Note the GTX275 and GTX460's are
>> VERY
>> > different chip architectures hence why a subtle bug such as this may
>> only
>> > manifest itself on one card and not the other.
>> >
>> > All the best
>> > Ross
>> >
>> > foo.linux-jh9j:~/amber11_as_of_jul_22/src/pmemd/src/cuda> md5sum *
>> > md5sum: B40C: Is a directory
>> > f4ed79de194d836246009d5c29051574  cuda_info.fpp
>> > a9e4f660fcb5347b1273a8e3f76d3e74  gpu.cpp
>> > 307e64e078aa5f1f22bd78fd224c9f4b  gpu.h
>> > 9e6a4f93e46046cda29369feb0dd32e8  gputypes.cpp
>> > 46f8ccf2bbee063ff35a73945b16a3a2  gputypes.h
>> > 90ba8d068522a00074707a529469f5ea  kCalculateGBBornRadii.cu
>> > 97fbbcfb8a3833509d94072ecab05643  kCalculateGBNonbondEnergy1.cu
>> > 79fb7a5bba2a19ba351a7dd5996d31fc  kCalculateGBNonbondEnergy2.cu
>> > 67a458e51a76162edbcc907e7135500c  kCalculateLocalForces.cu
>> > ce308f4fbe9468d5505beb0099d58e76  kCalculatePMENonbondEnergy.cu
>> > 9b240d418e391a71b590e6dc3bc3b0ff  kCCF.h
>> > 5561a56bc236291cb87b4770453d67a4  kCLF.h
>> > 86f220029e3a943a186ebcfd16e2dcd9  kCPNE.h
>> > 9905ed2e705bccf1ae705279d85d0e57  kForcesUpdate.cu
>> > edf2d74af7a4d401ccecc7bfa6d036c3  kNeighborList.cu
>> > fd65d023597024a68565c5a0e5ffd86c  kNTPKernels.h
>> > 49f952b429618228fca8e23f44223c58  kPGGW.h
>> > 4aea91b87cbb3cf62b9fddafe607ab48  kPGS.h
>> > 9c5951cdf94402d2c0396b74498f72f5  kPMEInterpolation.cu
>> > 46f01611524128ea428c069ef58bd421  kPSSE.h
>> > ada7d510598c88ed4adb8d32a9dbf73d  kRandom.h
>> > eefe9bd32e04ba2bbe2eb5611a6464bd  kShake.cu
>> > b07e184d2840ffae27d8af5415fae04a  kU.h
>> > 6947e1fae477c0bb9c637062a0ddbfd8  Makefile
>> > e5a6173273e6812669c21abcd1530226  Makefile.advanced
>> >
>> >> -----Original Message-----
>> >> From: Fabrício Bracht [mailto:bracht.iq.ufrj.br]
>> >> Sent: Wednesday, July 27, 2011 8:53 AM
>> >> To: AMBER Mailing List; Scott Brozell
>> >> Subject: Re: [AMBER] Failure kReduceSoluteCOM with GPU
>> >>
>> >> Hi,
>> >> I've only found $AMBERHOME/AmberTools/src/configure.rej .
>> >> I've checked the files that were supposed to be patched by
>> bugfix.11,
>> >> but wasn't able to confirm if they were patched or not due to my
>> lack
>> >> of programming knowledge. Any tips here?
>> >> One other thing. Why is it that this simulation ran successfully on
>> my
>> >> GTX275 computer but has problems with my GTX460?
>> >> Thank you
>> >> Fabrício
>> >>
>> >> 2011/7/27 Scott Brozell <sbrozell.rci.rutgers.edu>:
>> >> > Hi,
>> >> >
>> >> > The patch command should create a reject file: blabla.rej.
>> >> > So look for files with a rej extension.
>> >> > Also since in bugfix 11 there are only a few files to be patched
>> in
>> >> > src/pmemd/src/cuda, you could look at those files to see if the
>> >> > patch has been applied:
>> >> > http://ambermd.org/bugfixes/11.0/bugfix.11
>> >> >
>> >> > scott
>> >> >
>> >> > On Tue, Jul 26, 2011 at 10:07:28AM -0300, Fabrício Bracht wrote:
>> >> >> Hi Scott. How do I check if this specific bugfix has been applied
>> >> >> correctly? Would it be something like md5sum * in
>> >> >> $AMBERHOME/src/pmemd/src/cuda/ . And what should I look for?
>> >> >> Thank you
>> >> >> Fabrício
>> >> >>
>> >> >> 2011/7/26 Scott Brozell <sbrozell.rci.rutgers.edu>:
>> >> >> > Hi,
>> >> >> >
>> >> >> > This looks like a problem addressed by bugfix.11.
>> >> >> > I have not been following your threads closely,
>> >> >> > but i read that you were having problems with the bugfixes.
>> >> >> > You might inspect the files listed in bugfix.11 to determine
>> >> >> > whether the bugfixes were really applied, while you are waiting
>> >> >> > for someone that as been following your threads closely to
>> reply.
>> >> >> >
>> >> >> > scott
>> >> >> >
>> >> >> > On Tue, Jul 26, 2011 at 12:44:10AM -0300, Fabrício Bracht
>> wrote:
>> >> >> >> Since I finally was able to compile amber11 with cuda support
>> on
>> >> my
>> >> >> >> for my gtx460, I thought everything was fine, but it seems
>> that
>> >> now I
>> >> >> >> have to set a few things in order to get my system running
>> again.
>> >> Let
>> >> >> >> me explain more.
>> >> >> >> I was simulating a protein inside a micele. I had a few tens
>> of
>> >> >> >> nanoseconds simulated on a gtx275. The system is comprised of
>> >> water,
>> >> >> >> organic solvent, surfactant, counterions and my protein
>> (aprox.
>> >> 60000
>> >> >> >> atoms). When I tried to start a simulation using my restart
>> files
>> >> from
>> >> >> >> the GTX275 on my gtx460 machine, I got the following error.
>> >> >> >> Error: unspecified launch failure launching kernel
>> >> kReduceSoluteCOM
>> >> >> >> cudaFree GpuBuffer::Deallocate failed unspecified launch
>> failure
>> >> >> >>
>> >> >> >> I thought it might have something to do with a problem in the
>> >> restart
>> >> >> >> file or something like this, so I recreated the inpcrd and
>> prmtop
>> >> >> >> files for the last configuration and tried to start a new
>> fresh
>> >> one in
>> >> >> >> my gtx460 machine. Well, it didn't work out. I got the same
>> error
>> >> >> >> lines again.
>> >> >> >> Here is my configuration file.
>> >> >> >> MD parameters
>> >> >> >>  &cntrl
>> >> >> >>   imin   = 0,
>> >> >> >>   irest  = 1,
>> >> >> >>   ntx    = 7,
>> >> >> >>   ntb    = 2, pres0 = 1.0, ntp = 1, taup = 2.0,
>> >> >> >>   cut    = 9.0,
>> >> >> >>   ntr    = 1,
>> >> >> >>   ntc    = 2,
>> >> >> >>   ntf    = 2,
>> >> >> >>   tempi  = 300.0,
>> >> >> >>   temp0  = 300.0,
>> >> >> >>   ntt    = 3,
>> >> >> >>   gamma_ln = 1.0,
>> >> >> >>   nstlim = 5000000, dt = 0.002,
>> >> >> >>   ntpr = 10000, ntwx = 10000, ntwr = 1000
>> >> >> >>  /
>> >> >> >> Restraints
>> >> >> >> 5.0
>> >> >> >> RES 1 317
>> >> >> >> END
>> >> >> >> END
>> >> >> >>
>> >> >> >> And here is the command line:
>> >> >> >> pmemd.cuda -O -i md.in -c micel2.3.inpcrd -p micel2.3.prmtop -
>> r
>> >> >> >> md3.rst -o md3.out -ref micela2.3.inpcrd -inf md3.info -x
>> >> md3.mdcrd
>> >> >> >
>> >> >
>> >> > _______________________________________________
>> >> > AMBER mailing list
>> >> > AMBER.ambermd.org
>> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jul 29 2011 - 11:00:03 PDT
Custom Search