Re: [AMBER] AMBER11, pmemd.cuda: the system crashed. from Chinh Su Tran To on 2011-12-14 (Amber Archive Dec 2011)

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Thu, 15 Dec 2011 11:19:46 +0800

Dear Dr. Walker,

Thank you so much.

Regards,
Chinh

On Thu, Dec 15, 2011 at 6:06 AM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Chinh,
>
> This works fine on my system with an up to date patched version of AMBER
> 11.
> Looking at the output you sent me it looks like you are running with an
> unpatched version of AMBER 11. Your output has:
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> | Duncan Poole (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPDP] - Hybrid Single/Double Precision (Default).
> |
> |--------------------------------------------------------
>
> While it should have:
>
> |--------------------- INFORMATION ----------------------
> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> | Version 2.2
> |
> | 08/16/2011
> |
> |
> | Implementation by:
> | Ross C. Walker (SDSC)
> | Scott Le Grand (nVIDIA)
> | Duncan Poole (nVIDIA)
> |
> | CAUTION: The CUDA code is currently experimental.
> | You use it at your own risk. Be sure to
> | check ALL results carefully.
> |
> | Precision model in use:
> | [SPDP] - Hybrid Single/Double Precision (Default).
> |
> |--------------------------------------------------------
>
> Note the version number. Hence you are running with an out of date AMBER 11
> and this is almost certainly leading to your issues. Start from a
> completely
> clean Amber 11 directory created by untarring the original Amber11.tar.bz2
> file and patch is with the AMBER Bugfixes - see here:
> http://ambermd.org/bugfixes11.html - Make sure you get AMBERTools 1.5 and
> patch that as well.
>
> I would also note that:
>
> #heating in 6ns at 500K without restraint on the model to unfold the
> protein
>
> &cntrl
>
> imin=0,
>
> iwrap=0,
>
> irest=0,
>
> ntx=1,
>
> ntb=1,
>
> cut=10.0,
>
> ntr=0,
>
> ntc=2,
>
> ntf=2,
>
> tempi=500.0,
>
> temp0=500.0,
>
> ntt=3,
>
> gamma_ln=1.0,
>
> nstlim=3000000, dt=0.002,
>
> ntpr=1500, ntwx=1500,ntwr=1000
>
> /
>
> You are running at 500K but with a 2fs time step. You probably need to
> reduce the time step to 1.5fs or so to run at such an elevated temperature.
>
> All the best
> Ross
>
>
> > -----Original Message-----
> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > Sent: Tuesday, December 13, 2011 8:04 PM
> > To: AMBER Mailing List
> > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >
> > Dear Dr. Walker,
> >
> > I sent you the files and the info to your gmail. Thank you.
> >
> > Chinh
> >
> > On Tue, Dec 13, 2011 at 12:09 PM, Ross Walker <ross.rosswalker.co.uk>
> > wrote:
> >
> > > Hi Chinh,
> > >
> > > Can you send me (offlist) all of your input files please along with
> > details
> > > of your computer system. OS, NVIDIA compiler and driver version,
> > hardware
> > > spec (especially the GPU version). I need to be able to replicate
> > this in
> > > order to investigate what is going wrong.
> > >
> > > Please also include the output from the run that gave what looked
> > like a
> > > disk error.
> > >
> > > Thank you.
> > >
> > > All the best
> > > Ross
> > >
> > > > -----Original Message-----
> > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > > > Sent: Monday, December 12, 2011 8:04 PM
> > > > To: AMBER Mailing List
> > > > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > > >
> > > > Dear Dr. Walker,
> > > >
> > > > As you suggested, we ran the check for the hard disk, but both were
> > > > clean!
> > > > I tried to run the same code using pmemd only, and it was fine,
> > i.e. no
> > > > crash, no error.
> > > >
> > > > But then I returned using pmemd.cuda, it happened again (crashed).
> > > >
> > > > There were also 2 problems that I noticed when I was using
> > pmemd.cuda:
> > > >
> > > > 1. When I used *iwrap=0* (in the input file as below), it showed
> > > > "segmentation
> > > > fault" immediately. I knew that it was an old error I encountered
> > (but
> > > > I
> > > > wanted to try it to detect the pmemd.cuda only).
> > > > 2. Then I switched it* iwrap=1* with some modification in the
> > > > *gpu.cpp*(the solution that I found in the AMBER forum), it
> > crashed. (
> > > > *However, it also crashed before these modifications)*
> > > >
> > > > Please help. We did not know what was wrong.
> > > >
> > > > The input is:
> > > >
> > > > &cntrl
> > > > imin=0,
> > > > * iwrap=1 => I also tried iwrap=0*
> > > > irest=0,
> > > > ntx=1,
> > > > ntb=1,
> > > > cut=10.0,
> > > > ntr=0,
> > > > ntc=2,
> > > > ntf=2,
> > > > tempi=500.0,
> > > > temp0=500.0,
> > > > ntt=3,
> > > > gamma_ln=1.0,
> > > > nstlim=3000000, dt=0.002,
> > > > ntpr=1500, ntwx=1500,ntwr=1000
> > > > /
> > > >
> > > >
> > > > Thank you.
> > > > Chinsu
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <ross.rosswalker.co.uk>
> > > > wrote:
> > > >
> > > > > Hi Chinsu,
> > > > >
> > > > > This looks like a hard drive failure to me (or pending hard drive
> > > > failure).
> > > > > Please try things with the CPU version of the code and see what
> > > > happens. I
> > > > > can't see how this could be generated by the GPU code. You might
> > want
> > > > to
> > > > > try
> > > > > booting the machine in single user (or recovery mode) and running
> > an
> > > > fsck
> > > > > on
> > > > > the file system. You could also try running a smartctl check on
> > the
> > > > hard
> > > > > drive to see what it's diagnostics are reporting.
> > > > >
> > > > > All the best
> > > > > Ross
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > > > > > Sent: Monday, December 05, 2011 7:33 PM
> > > > > > To: AMBER Mailing List
> > > > > > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > > > > >
> > > > > > Dear AMBER users,
> > > > > >
> > > > > > I was running pmemd.cuda using Amber11 on a GPU which is
> > installed
> > > > in a
> > > > > > workstation.
> > > > > > The process was of 2 steps of short minimizations and a 6ns of
> > > > heating
> > > > > > the
> > > > > > protein (270 residues) at 500K.
> > > > > >
> > > > > > The minimizations were fine, but when i ran the heating, my
> > system
> > > > > > "crashed". The errors are as below:
> > > > > >
> > > > > > *[2932683.628873] EXT4-fs (sda1): previous I/O error to
> > superblock
> > > > > > detected*
> > > > > > *
> > > > > > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> > > > > > inode#9699494:
> > > > > > comm init: reading directory lblock 0**
> > > > > > *
> > > > > > *
> > > > > > *
> > > > > > I re-booted the system, and tried to run it again. The same
> > errors
> > > > > > came.
> > > > > > My input file is:
> > > > > >
> > > > > > &cntrl
> > > > > > imin=0,
> > > > > > * iwrap=1 => I also tried iwrap=0*
> > > > > > irest=0,
> > > > > > ntx=1,
> > > > > > ntb=1,
> > > > > > cut=10.0,
> > > > > > ntr=0,
> > > > > > ntc=2,
> > > > > > ntf=2,
> > > > > > tempi=500.0,
> > > > > > temp0=500.0,
> > > > > > ntt=3,
> > > > > > gamma_ln=1.0,
> > > > > > nstlim=3000000, dt=0.002,
> > > > > > ntpr=1500, ntwx=1500,ntwr=1000
> > > > > > /
> > > > > > *
> > > > > > *
> > > > > > *
> > > > > > *
> > > > > > We tried to find out what was going on, but we did not know
> > where
> > > > the
> > > > > > crash
> > > > > > came from?
> > > > > > Please help.
> > > > > >
> > > > > > Thank you.
> > > > > >
> > > > > > Regards,
> > > > > > Chinsu
> > > > > > _______________________________________________
> > > > > > AMBER mailing list
> > > > > > AMBER.ambermd.org
> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > AMBER mailing list
> > > > > AMBER.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Dec 14 2011 - 19:30:02 PST