Re: [AMBER] RE GPU test fails

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 3 Apr 2012 11:35:57 -0700

Hi Mohammad,

You really see no other output? - No error messages, no launch kernel launch
failures. I think there is something very wrong with you setup.

I would start by taking a fresh AMBER untar, patch it and then recompile
making sure you are using CUDA 4.0. I would also suggest trying to recompile
the NVIDIA SDK and running some of the tests in there to make sure they run
properly. Right now it could be any range of problems, perhaps the CUDA
library path in your LD_LIBRARY_PATH does not match the cuda toolkit version
used to build the executable or some other weird situation.

Try starting from scratch and building it again using the current
environment you are running from.

All the best
Ross

> -----Original Message-----
> From: Mohammad Ashraf Bhuiyan [mailto:akasheee.gmail.com]
> Sent: Tuesday, April 03, 2012 10:40 AM
> To: amber.ambermd.org
> Subject: [AMBER] RE GPU test fails
>
> Hi Ross,
>
> I tried with the individual run as you suggested and got the follows:
>
> bash-4.1$ ./Run_md_trpcage -1 SPDP
> ./Run_md_trpcage: Program error
> bash-4.1$
>
> The trpcage_md.out file contains the following:
> | PMEMD implementation of SANDER, Release 11
>
> | Run on 04/03/2012 at 10:22:55
>
> [-O]verwriting output
>
> File Assignments:
> | MDIN: mdin
> | MDOUT: trpcage_md.out
> | INPCRD: inpcrd
> | PARM: prmtop
> | RESTRT: restrt
> | REFC: refc
> | MDVEL: mdvel
> | MDEN: mden
> | MDCRD: mdcrd
> | MDINFO: mdinfo
>
>
> Here is the input file:
>
> TRPCage MD
> &cntrl
> imin=0, irest=1, ntx=5,
> nstlim=20, dt=0.002,
> ntc=2, ntf=2,
> ntt=1, tautp=0.5,
> tempi=325.0, temp0=325.0,
> ntpr=1, ntwx=0,ntwr=100000,
> ntb=0, igb=1,
> cut=9999.,rgbmax=9999.
> /
>
>
> -------------Thus there is no output produced. In case you may want to
> know, the devicequery for CUDA device produces:
>
> bash-4.1$
> ../../../../../../NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQu
> ery
> [../../../../../../NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQ
> uery]
> starting...
> ../../../../../../NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQu
> ery
> Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> There are 2 devices supporting CUDA
>
> Device 0: "Tesla M2090"
> CUDA Driver Version / Runtime Version 4.10 / 4.10
> CUDA Capability Major/Minor version number: 2.0
> Total amount of global memory: 5375 MBytes
> (5636554752 bytes)
> ( 0) Multiprocessors x (32) CUDA Cores/MP: 0 CUDA Cores
> GPU Clock Speed: 1.30 GHz
> Memory Clock rate: 1848.00 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 786432 bytes
> Max Texture Dimension Size (x,y,z) 1D=(1), 2D=(0,65536),
> 3D=(134217728,65536,65535)
> Max Layered Texture Size (dim) x layers 1D=(65000) x 65000,
> 2D=(1048544,16384) x 16384
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 32768
> Warp size: 32
> Maximum number of threads per block: 1024
> Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and execution: Yes with 2048 copy
> engine(s)
> Run time limit on kernels: Yes
> Integrated GPU sharing Host Memory: Yes
> Support host page-locked memory mapping: No
> Concurrent kernel execution: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support enabled: Yes
> Device is using TCC driver mode: Yes
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 2048 / 16384
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> Device 1: "GeForce 8400 GS"
> CUDA Driver Version / Runtime Version 4.10 / 4.10
> CUDA Capability Major/Minor version number: 1.1
> Total amount of global memory: 511 MBytes (536150016
> bytes)
> ( 0) Multiprocessors x ( 8) CUDA Cores/MP: 0 CUDA Cores
> GPU Clock Speed: 1.40 GHz
> Memory Clock rate: 333.00 Mhz
> Memory Bus Width: 64-bit
> Max Texture Dimension Size (x,y,z) 1D=(1), 2D=(0,8192),
> 3D=(134217728,65536,32768)
> Max Layered Texture Size (dim) x layers 1D=(65000) x 65000,
> 2D=(1048544,0) x 0
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 16384 bytes
> Total number of registers available per block: 8192
> Warp size: 32
> Maximum number of threads per block: 512
> Maximum sizes of each dimension of a block: 512 x 512 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 256 bytes
> Concurrent copy and execution: Yes with 512 copy
> engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: Yes
> Support host page-locked memory mapping: No
> Concurrent kernel execution: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support enabled: Yes
> Device is using TCC driver mode: Yes
> Device supports Unified Addressing (UVA): No
> Device PCI Bus ID / PCI location ID: 512 / 8192
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.10, CUDA
> Runtime Version = 4.10, NumDevs = 2, Device = Tesla M2090, Device =
> GeForce 8400 GS
> [../../../../../../NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQ
> uery]
> test results...
> PASSED
>
> Press ENTER to exit...
>
>
>
> Thanks
>
>
> > Hi Mohammad,
> >
> > Do you get any other errors than just 'program error'?
> >
> > Could you try running one of the tests manually?
> >
> > cd $AMBERHOME/test/cuda/trpcage/
> > ./Run_md_trpcage -1 SPDP
> >
> > Then post the messages you get to the screen and also the contents of
> > trpcage_md.out
> >
> > All the best
> > Ross
> >
> >> -----Original Message-----
> >> From: Mohammad Ashraf Bhuiyan [mailto:akasheee.gmail.com]
> >> Sent: Monday, April 02, 2012 5:44 PM
> >> To: amber.ambermd.org
> >> Subject: [AMBER] Amber GPU test fails
> >>
> >> Hi,
> >> I installed Amber cpu version and works fine.
> >>
> >> Then i tried GPU version. I applied latest bugfixes for Ambertools
> 1.5
> >> and Amber11. The Amber11 is also installed for cuda without any
> >> errors. But when I tried with the test,
> >> ./test_amber_cuda.sh, then it throws error, such as:
> >>
> >> bash-4.1$ ./test_amber_cuda.sh
> >> Using default GPU_ID = -1
> >> Using default PREC_MODEL = SPDP
> >> cd cuda && make -k test.pmemd.cuda GPU_ID=-1 PREC_MODEL=SPDP
> >> make[1]: Entering directory
> >> `/localdisk/ashraf/Amber/amber11_cuda/amber11/test/cuda'
> >> ------------------------------------
> >> Running CUDA Implicit solvent tests.
> >>   Precision Model = SPDP
> >>            GPU_ID = -1
> >> ------------------------------------
> >> cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
> >>   ./Run_md_trpcage:  Program error
> >> make[1]: *** [test.pmemd.cuda.gb] Error 1
> >> cd gb_ala3/ && ./Run.igb1_ntc1_min -1 SPDP netcdf.mod
> >>   ./Run.igb1_ntc1_min:  Program error
> >> make[1]: *** [test.pmemd.cuda.gb.serial] Error 1
> >> ------------------------------------
> >> Running CUDA Explicit solvent tests.
> >>   Precision Model = SPDP
> >>            GPU_ID = -1
> >> ------------------------------------
> >> cd 4096wat/ && ./Run.pure_wat -1 SPDP netcdf.mod
> >>   ./Run.pure_wat:  Program error
> >> make[1]: *** [test.pmemd.cuda.pme] Error 1
> >> make[1]: Target `test.pmemd.cuda' not remade because of errors.
> >> make[1]: Leaving directory
> >> `/localdisk/ashraf/Amber/amber11_cuda/amber11/test/cuda'
> >> make: *** [test.pmemd.cuda] Error 2
> >>
> >>
> >>
> >> Also I tried to run benchmark listed, such as DHFR NVE = 23,558
> atoms
> >> given in the Amber website. It does not produce any output. It does
> >> not crash. But the mdout file only has the input listed, no output.
> >>
> >>
> >> Did anyone experience this kind of errors. Any idea will be
> >> appreciated?
> >> I am using Nvidia 2090 and it works fine for other CUDA code.
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Ashraf
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 7
> > Date: Mon, 2 Apr 2012 22:58:09 -0700 (PDT)
> > From: Acoot Brett <acootbrett.yahoo.com>
> > Subject: [AMBER] how where to read the amber and amber test failure
> >        diff    files
> > To: AMBER Mailing List <amber.ambermd.org>
> > Message-ID:
> >        <1333432689.26423.YahooMailNeo.web121805.mail.ne1.yahoo.com>
> > Content-Type: text/plain; charset=utf-8
> >
> >
> >
> > Dear All,
> >
> > After the AMBERTOOLS test nd AMBER test, for the failure test, it
> suggest to read the diff file.
> >
> > Do all the failure test diff file exist in the final single diff
> file? Otherwise I could not locate the specific diff file.
> >
> > I am looking forward to getting a reply from you on it.
> >
> > Cheers,
> >
> > Acoot
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Mon, 2 Apr 2012 23:14:36 -0700 (PDT)
> > From: Acoot Brett <acootbrett.yahoo.com>
> > Subject: [AMBER] on "-v" in mdrun
> > To: AMBER Mailing List <amber.ambermd.org>
> > Message-ID:
> >        <1333433676.21007.YahooMailNeo.web121801.mail.ne1.yahoo.com>
> > Content-Type: text/plain; charset=utf-8
> >
> >
> >
> > Dear All,
> >
> > For the function of mdrun. will you please introduce the time
> difference for the whole mdrun with or without "-v"? I suppose with "-
> v" will need some calculation time, but I am not sure whether the time
> spent is significant.
> >
> > Cheers,
> >
> > Acoot
> >
> >
> > ------------------------------
> >
> > Message: 9
> > Date: Tue, 3 Apr 2012 08:50:41 +0200
> > From: francesco oteri <francesco.oteri.gmail.com>
> > Subject: Re: [AMBER] Use of PCASuite in analyzing DNA-MD trajectories
> > To: AMBER Mailing List <amber.ambermd.org>
> > Message-ID:
> >        <CAFQcp-
> PK_h82DVYe3S68qRB5R6JZzvok7MR2UphFv+4Aae9XnQ.mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi,
> > perhaps your trajectory is too large.
> > Try to use a smaller trajectory, with only CA for example.
> >
> > Francesco
> >
> >
> > Il giorno 02 aprile 2012 20:53, Muhammad Khaled Tumbi
> <khaledtumbi.gmail.com
> >> ha scritto:
> >
> >> Hi,
> >> did any one used Principle Component Analysis method to analyse the
> >> molecular dynamics trajectory (Meyer et al., J. Chem. Theor. Comp.
> 2006, 2,
> >> 251-258 ).
> >> I am trying this using PCASuite (pcazip) but getting error " not
> enough
> >> memory allocating -553492920 bytes"
> >> Can anybody help me to solve this problem
> >> Regards
> >> =-=-=-=
> >> Tumbi Muhammed Khaled Abdul Waheed,
> >> Institute Scholar,
> >> Department Of Pharmacoinformatics,
> >> NIPER, S.A.S. Nagar, Mohali,
> >> Punjab, India.
> >> Mob. +91 78145 24855
> >> www.niper.ac.in
> >> www.pharmacoinformatics.info
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> >
> > --
> > Cordiali saluti, Dr.Oteri Francesco
> >
> >
> > ------------------------------
> >
> > Message: 10
> > Date: Tue, 03 Apr 2012 12:01:20 +0200
> > From: FyD <fyd.q4md-forcefieldtools.org>
> > Subject: Re: [AMBER] metal site
> > To: AMBER Mailing List <amber.ambermd.org>
> > Message-ID: <20120403120120.0kz1e55aw0kok8k4.webmail.u-picardie.fr>
> > Content-Type: text/plain;       charset=ISO-8859-1;     DelSp="Yes";
> >        format="flowed"
> >
> > Dear Per,
> >
> > Did you follow the bonded and non-bonded energy values?
> > Is there any 'problem' with the 1-4 electrostatic term for instance?
> >
> > regards, Francois
> >
> >
> >> I have some problems with a metal site I have parametrized in a
> protein. It
> >> is a copper site with four cysteine residue - so I start by doing a
> QM
> >> optimization of the geometry and getting a charge model using the
> >> Merz-Kollman in Turbomole. When I minimize with the newly added
> parameters
> >> the equilibrium distances and angles are reached - also during
> heating and
> >> for the first 20 ns of simulation but after 20 ns it starts to
> become
> >> strange two of the sulfur atoms come very close (~1.8 ?) both are
> >> negatively charged and stays there furthermore the equilibrium
> distance is
> >> no longer sampled for the S-Cu distance it is shorten by 0.2 ?. The
> force
> >> constant for the metal-sulfur interaction is around 40 so it should
> be
> >> quite a penalty to the energy but mean and SD are the same.
> >>
> >> Clearly something is wrong but I have no idea how to narrow it down
> - I
> >> have create the residues using templates from library files of e.g.
> CYM and
> >> HIE and the generated frcmod file with the missing parameters. I
> have added
> >> a new atom type for the sulfur but is added as atom type "S" "sp3".
> I have
> >> check the prmtop and it contains the force constants etc. Also in
> the leap
> >> log I do not see any errors or warnings related to the metal site
> and its
> >> residues.
> >>
> >> all suggestions are more then welcome and needed
> >
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 11
> > Date: Tue, 3 Apr 2012 11:27:49 +0100
> > From: Dureid El-Moghraby <D.El-Moghraby.leeds.ac.uk>
> > Subject: [AMBER] "Possible Failures" in Amber 11 and Amber tools
> tests
> > To: AMBER Mailing List <amber.ambermd.org>
> > Message-ID: <CBA07230.7E19%d.el-moghraby.leeds.ac.uk>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Dear all,
> >
> >
> > I am currently in the process of getting Amber 11 and
> > Ambertools 1.5 installed on our institution's computing
> cluster.  Both
> > Amber tools and Amber compiled with no errors.
> >
> > The tests for Amber 11 came back with some errors and possible
> failures.
> > All the errors and some of the failures are those mentioned by
> > AT15_Amber11.py . There are a few of the file comparison tests that
> failed
> > and are not in the expected list of
> > failed tests. In the case of Amber Tools there were about 10 failed
> file
> > comparisons. For the parallel versions there was 1 and 7 comparison
> > failure for Amber and Amber tools
> > respectively.
> >
> > II have included the logs and diff files for these with this
> > email. Unfortunately, I do not know enough to judge whether these
> > differences are at an acceptable level, and given the diverse nature
> of
> > our research community, cannot be sure that these features will not
> be
> > used.  Any help you can provide would be much appreciated.
> >
> >
> >
> > Please find the configuration details below.
> >
> > RHEL 5
> > Intel compilers version 12.0.2
> > MKL 10.3u2 and OpenMPI 1.4.
> >
> > Bugfix level:
> > Ambertools 1.5 :  fixes 1-26
> > Amber 11: bugfix.all.tar.bz2  contain up to bugfix18
> >
> > Configuration:
> > SSE_TYPES="SSE3,SSE4.2"
> > ./configure intel (serial)
> > ./configure -mpi intel (parallel)
> >
> >
> >
> > Kind regards,
> >
> > Dureid
> >
> >
> >
> >
> >
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: logs_and_diffs.tar.gz
> > Type: application/x-gzip
> > Size: 230698 bytes
> > Desc: logs_and_diffs.tar.gz
> > Url :
> http://lists.ambermd.org/mailman/private/amber/attachments/20120403/fc0
> 85b7a/attachment.gz
> >
> > ------------------------------
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > End of AMBER Digest, Vol 113, Issue 1
> > *************************************
>
>
>
> --
> Best Regards
>
> Ashraf
>
> --------------------------------------------------
> M Ashraf Bhuiyan, PhD
> Hillsboro, OR, USA
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 03 2012 - 12:00:04 PDT
Custom Search