Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Fri, 13 Jan 2012 13:23:57 +0800

Dear Dr. Le Grand,

Thanks for your suggestion. We actually tried it on 2 hard disks. The first
is totally "gone" 'cause of the crash. This is the 2nd one.
We 're checking the system to find at least a reason. Thank you a lot for
your help.

Regards,
Chinh

On Thu, Jan 12, 2012 at 12:40 AM, Scott Le Grand <varelse2005.gmail.com>wrote:

> Your system is hosed in some way. This is not an AMBER issue. Eventually
> you will find out just how, probably at the most inopportune time.
>
> The first suspect is the hard drive. Replace it and see what happens.
> Filesystem utilities can work around existing defects for a while, but
> eventually things end badly.
>
> If this doesn't fix the problem, it's some sort of bizarro
> motherboard/BIOS/CPU monstrosity not worth the time to figure out. But a
> filesystem error has zero zip nada null to do with pmemd.cuda.
>
> Scott
>
>
>
>
>
> On Wed, Jan 11, 2012 at 12:04 AM, Chinh Su Tran To <
> chinh.sutranto.gmail.com
> > wrote:
>
> > Dear Dr. Walker,
> >
> > As you suggested, we updated and patched all the bugfix for both the
> > Amber11 and AmberTool1.5. However, the crash still came.
> >
> > Firstly, it ran fine for the first stage (6 ns), then crashed, but the
> > result was generated. I re-booted it, then ran the second stage (the
> next 6
> > ns) using the previous results. It also gave the results, then crashed.
> > When I switched the time-step to 1.5 fs, it crashed in a few minutes
> after
> > the job was submitted.
> >
> > We doubt that there might be some problem with the compatibility of the
> > system. We are using a workstation of DELL system, OS ubuntu 11.04
> > (GNU/Linux 2.6.38-8 server x86_64), GPU GeForce GTX 580 (CUDA
> > version/runtime version 4.0/4.0), nvcc NVIDIA Cuda compiler driver.
> >
> > Could you please let us know if there is any "special" requirements for
> the
> > computer system that needs to be compatible with the GPU card?
> >
> > Thank you, and looking forward to your help.
> >
> > Regards,
> > Chinh
> >
> >
> > On Thu, Dec 15, 2011 at 11:19 AM, Chinh Su Tran To <
> > chinh.sutranto.gmail.com
> > > wrote:
> >
> > > Dear Dr. Walker,
> > >
> > > Thank you so much.
> > >
> > > Regards,
> > > Chinh
> > >
> > >
> > > On Thu, Dec 15, 2011 at 6:06 AM, Ross Walker <ross.rosswalker.co.uk
> > >wrote:
> > >
> > >> Hi Chinh,
> > >>
> > >> This works fine on my system with an up to date patched version of
> AMBER
> > >> 11.
> > >> Looking at the output you sent me it looks like you are running with
> an
> > >> unpatched version of AMBER 11. Your output has:
> > >>
> > >> |--------------------- INFORMATION ----------------------
> > >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > >> |
> > >> | Implementation by:
> > >> | Ross C. Walker (SDSC)
> > >> | Scott Le Grand (nVIDIA)
> > >> | Duncan Poole (nVIDIA)
> > >> |
> > >> | CAUTION: The CUDA code is currently experimental.
> > >> | You use it at your own risk. Be sure to
> > >> | check ALL results carefully.
> > >> |
> > >> | Precision model in use:
> > >> | [SPDP] - Hybrid Single/Double Precision (Default).
> > >> |
> > >> |--------------------------------------------------------
> > >>
> > >> While it should have:
> > >>
> > >> |--------------------- INFORMATION ----------------------
> > >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> > >> | Version 2.2
> > >> |
> > >> | 08/16/2011
> > >> |
> > >> |
> > >> | Implementation by:
> > >> | Ross C. Walker (SDSC)
> > >> | Scott Le Grand (nVIDIA)
> > >> | Duncan Poole (nVIDIA)
> > >> |
> > >> | CAUTION: The CUDA code is currently experimental.
> > >> | You use it at your own risk. Be sure to
> > >> | check ALL results carefully.
> > >> |
> > >> | Precision model in use:
> > >> | [SPDP] - Hybrid Single/Double Precision (Default).
> > >> |
> > >> |--------------------------------------------------------
> > >>
> > >> Note the version number. Hence you are running with an out of date
> AMBER
> > >> 11
> > >> and this is almost certainly leading to your issues. Start from a
> > >> completely
> > >> clean Amber 11 directory created by untarring the original
> > Amber11.tar.bz2
> > >> file and patch is with the AMBER Bugfixes - see here:
> > >> http://ambermd.org/bugfixes11.html - Make sure you get AMBERTools 1.5
> > and
> > >> patch that as well.
> > >>
> > >> I would also note that:
> > >>
> > >> #heating in 6ns at 500K without restraint on the model to unfold the
> > >> protein
> > >>
> > >> &cntrl
> > >>
> > >> imin=0,
> > >>
> > >> iwrap=0,
> > >>
> > >> irest=0,
> > >>
> > >> ntx=1,
> > >>
> > >> ntb=1,
> > >>
> > >> cut=10.0,
> > >>
> > >> ntr=0,
> > >>
> > >> ntc=2,
> > >>
> > >> ntf=2,
> > >>
> > >> tempi=500.0,
> > >>
> > >> temp0=500.0,
> > >>
> > >> ntt=3,
> > >>
> > >> gamma_ln=1.0,
> > >>
> > >> nstlim=3000000, dt=0.002,
> > >>
> > >> ntpr=1500, ntwx=1500,ntwr=1000
> > >>
> > >> /
> > >>
> > >> You are running at 500K but with a 2fs time step. You probably need to
> > >> reduce the time step to 1.5fs or so to run at such an elevated
> > >> temperature.
> > >>
> > >> All the best
> > >> Ross
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > >> > Sent: Tuesday, December 13, 2011 8:04 PM
> > >> > To: AMBER Mailing List
> > >> > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > >> >
> > >> > Dear Dr. Walker,
> > >> >
> > >> > I sent you the files and the info to your gmail. Thank you.
> > >> >
> > >> > Chinh
> > >> >
> > >> > On Tue, Dec 13, 2011 at 12:09 PM, Ross Walker <
> ross.rosswalker.co.uk>
> > >> > wrote:
> > >> >
> > >> > > Hi Chinh,
> > >> > >
> > >> > > Can you send me (offlist) all of your input files please along
> with
> > >> > details
> > >> > > of your computer system. OS, NVIDIA compiler and driver version,
> > >> > hardware
> > >> > > spec (especially the GPU version). I need to be able to replicate
> > >> > this in
> > >> > > order to investigate what is going wrong.
> > >> > >
> > >> > > Please also include the output from the run that gave what looked
> > >> > like a
> > >> > > disk error.
> > >> > >
> > >> > > Thank you.
> > >> > >
> > >> > > All the best
> > >> > > Ross
> > >> > >
> > >> > > > -----Original Message-----
> > >> > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > >> > > > Sent: Monday, December 12, 2011 8:04 PM
> > >> > > > To: AMBER Mailing List
> > >> > > > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > >> > > >
> > >> > > > Dear Dr. Walker,
> > >> > > >
> > >> > > > As you suggested, we ran the check for the hard disk, but both
> > were
> > >> > > > clean!
> > >> > > > I tried to run the same code using pmemd only, and it was fine,
> > >> > i.e. no
> > >> > > > crash, no error.
> > >> > > >
> > >> > > > But then I returned using pmemd.cuda, it happened again
> (crashed).
> > >> > > >
> > >> > > > There were also 2 problems that I noticed when I was using
> > >> > pmemd.cuda:
> > >> > > >
> > >> > > > 1. When I used *iwrap=0* (in the input file as below), it showed
> > >> > > > "segmentation
> > >> > > > fault" immediately. I knew that it was an old error I
> encountered
> > >> > (but
> > >> > > > I
> > >> > > > wanted to try it to detect the pmemd.cuda only).
> > >> > > > 2. Then I switched it* iwrap=1* with some modification in the
> > >> > > > *gpu.cpp*(the solution that I found in the AMBER forum), it
> > >> > crashed. (
> > >> > > > *However, it also crashed before these modifications)*
> > >> > > >
> > >> > > > Please help. We did not know what was wrong.
> > >> > > >
> > >> > > > The input is:
> > >> > > >
> > >> > > > &cntrl
> > >> > > > imin=0,
> > >> > > > * iwrap=1 => I also tried iwrap=0*
> > >> > > > irest=0,
> > >> > > > ntx=1,
> > >> > > > ntb=1,
> > >> > > > cut=10.0,
> > >> > > > ntr=0,
> > >> > > > ntc=2,
> > >> > > > ntf=2,
> > >> > > > tempi=500.0,
> > >> > > > temp0=500.0,
> > >> > > > ntt=3,
> > >> > > > gamma_ln=1.0,
> > >> > > > nstlim=3000000, dt=0.002,
> > >> > > > ntpr=1500, ntwx=1500,ntwr=1000
> > >> > > > /
> > >> > > >
> > >> > > >
> > >> > > > Thank you.
> > >> > > > Chinsu
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <
> > ross.rosswalker.co.uk>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Chinsu,
> > >> > > > >
> > >> > > > > This looks like a hard drive failure to me (or pending hard
> > drive
> > >> > > > failure).
> > >> > > > > Please try things with the CPU version of the code and see
> what
> > >> > > > happens. I
> > >> > > > > can't see how this could be generated by the GPU code. You
> might
> > >> > want
> > >> > > > to
> > >> > > > > try
> > >> > > > > booting the machine in single user (or recovery mode) and
> > running
> > >> > an
> > >> > > > fsck
> > >> > > > > on
> > >> > > > > the file system. You could also try running a smartctl check
> on
> > >> > the
> > >> > > > hard
> > >> > > > > drive to see what it's diagnostics are reporting.
> > >> > > > >
> > >> > > > > All the best
> > >> > > > > Ross
> > >> > > > >
> > >> > > > > > -----Original Message-----
> > >> > > > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > >> > > > > > Sent: Monday, December 05, 2011 7:33 PM
> > >> > > > > > To: AMBER Mailing List
> > >> > > > > > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > >> > > > > >
> > >> > > > > > Dear AMBER users,
> > >> > > > > >
> > >> > > > > > I was running pmemd.cuda using Amber11 on a GPU which is
> > >> > installed
> > >> > > > in a
> > >> > > > > > workstation.
> > >> > > > > > The process was of 2 steps of short minimizations and a 6ns
> of
> > >> > > > heating
> > >> > > > > > the
> > >> > > > > > protein (270 residues) at 500K.
> > >> > > > > >
> > >> > > > > > The minimizations were fine, but when i ran the heating, my
> > >> > system
> > >> > > > > > "crashed". The errors are as below:
> > >> > > > > >
> > >> > > > > > *[2932683.628873] EXT4-fs (sda1): previous I/O error to
> > >> > superblock
> > >> > > > > > detected*
> > >> > > > > > *
> > >> > > > > > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> > >> > > > > > inode#9699494:
> > >> > > > > > comm init: reading directory lblock 0**
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > I re-booted the system, and tried to run it again. The same
> > >> > errors
> > >> > > > > > came.
> > >> > > > > > My input file is:
> > >> > > > > >
> > >> > > > > > &cntrl
> > >> > > > > > imin=0,
> > >> > > > > > * iwrap=1 => I also tried iwrap=0*
> > >> > > > > > irest=0,
> > >> > > > > > ntx=1,
> > >> > > > > > ntb=1,
> > >> > > > > > cut=10.0,
> > >> > > > > > ntr=0,
> > >> > > > > > ntc=2,
> > >> > > > > > ntf=2,
> > >> > > > > > tempi=500.0,
> > >> > > > > > temp0=500.0,
> > >> > > > > > ntt=3,
> > >> > > > > > gamma_ln=1.0,
> > >> > > > > > nstlim=3000000, dt=0.002,
> > >> > > > > > ntpr=1500, ntwx=1500,ntwr=1000
> > >> > > > > > /
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > We tried to find out what was going on, but we did not know
> > >> > where
> > >> > > > the
> > >> > > > > > crash
> > >> > > > > > came from?
> > >> > > > > > Please help.
> > >> > > > > >
> > >> > > > > > Thank you.
> > >> > > > > >
> > >> > > > > > Regards,
> > >> > > > > > Chinsu
> > >> > > > > > _______________________________________________
> > >> > > > > > AMBER mailing list
> > >> > > > > > AMBER.ambermd.org
> > >> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > > > >
> > >> > > > >
> > >> > > > > _______________________________________________
> > >> > > > > AMBER mailing list
> > >> > > > > AMBER.ambermd.org
> > >> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > > > >
> > >> > > > _______________________________________________
> > >> > > > AMBER mailing list
> > >> > > > AMBER.ambermd.org
> > >> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > >
> > >> > >
> > >> > > _______________________________________________
> > >> > > AMBER mailing list
> > >> > > AMBER.ambermd.org
> > >> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > >
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jan 12 2012 - 21:30:02 PST
Custom Search