Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.

From: Scott Le Grand <varelse2005.gmail.com>
Date: Wed, 11 Jan 2012 08:40:03 -0800

Your system is hosed in some way. This is not an AMBER issue. Eventually
you will find out just how, probably at the most inopportune time.

The first suspect is the hard drive. Replace it and see what happens.
Filesystem utilities can work around existing defects for a while, but
eventually things end badly.

If this doesn't fix the problem, it's some sort of bizarro
motherboard/BIOS/CPU monstrosity not worth the time to figure out. But a
filesystem error has zero zip nada null to do with pmemd.cuda.

Scott





On Wed, Jan 11, 2012 at 12:04 AM, Chinh Su Tran To <chinh.sutranto.gmail.com
> wrote:

> Dear Dr. Walker,
>
> As you suggested, we updated and patched all the bugfix for both the
> Amber11 and AmberTool1.5. However, the crash still came.
>
> Firstly, it ran fine for the first stage (6 ns), then crashed, but the
> result was generated. I re-booted it, then ran the second stage (the next 6
> ns) using the previous results. It also gave the results, then crashed.
> When I switched the time-step to 1.5 fs, it crashed in a few minutes after
> the job was submitted.
>
> We doubt that there might be some problem with the compatibility of the
> system. We are using a workstation of DELL system, OS ubuntu 11.04
> (GNU/Linux 2.6.38-8 server x86_64), GPU GeForce GTX 580 (CUDA
> version/runtime version 4.0/4.0), nvcc NVIDIA Cuda compiler driver.
>
> Could you please let us know if there is any "special" requirements for the
> computer system that needs to be compatible with the GPU card?
>
> Thank you, and looking forward to your help.
>
> Regards,
> Chinh
>
>
> On Thu, Dec 15, 2011 at 11:19 AM, Chinh Su Tran To <
> chinh.sutranto.gmail.com
> > wrote:
>
> > Dear Dr. Walker,
> >
> > Thank you so much.
> >
> > Regards,
> > Chinh
> >
> >
> > On Thu, Dec 15, 2011 at 6:06 AM, Ross Walker <ross.rosswalker.co.uk
> >wrote:
> >
> >> Hi Chinh,
> >>
> >> This works fine on my system with an up to date patched version of AMBER
> >> 11.
> >> Looking at the output you sent me it looks like you are running with an
> >> unpatched version of AMBER 11. Your output has:
> >>
> >> |--------------------- INFORMATION ----------------------
> >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> >> |
> >> | Implementation by:
> >> | Ross C. Walker (SDSC)
> >> | Scott Le Grand (nVIDIA)
> >> | Duncan Poole (nVIDIA)
> >> |
> >> | CAUTION: The CUDA code is currently experimental.
> >> | You use it at your own risk. Be sure to
> >> | check ALL results carefully.
> >> |
> >> | Precision model in use:
> >> | [SPDP] - Hybrid Single/Double Precision (Default).
> >> |
> >> |--------------------------------------------------------
> >>
> >> While it should have:
> >>
> >> |--------------------- INFORMATION ----------------------
> >> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
> >> | Version 2.2
> >> |
> >> | 08/16/2011
> >> |
> >> |
> >> | Implementation by:
> >> | Ross C. Walker (SDSC)
> >> | Scott Le Grand (nVIDIA)
> >> | Duncan Poole (nVIDIA)
> >> |
> >> | CAUTION: The CUDA code is currently experimental.
> >> | You use it at your own risk. Be sure to
> >> | check ALL results carefully.
> >> |
> >> | Precision model in use:
> >> | [SPDP] - Hybrid Single/Double Precision (Default).
> >> |
> >> |--------------------------------------------------------
> >>
> >> Note the version number. Hence you are running with an out of date AMBER
> >> 11
> >> and this is almost certainly leading to your issues. Start from a
> >> completely
> >> clean Amber 11 directory created by untarring the original
> Amber11.tar.bz2
> >> file and patch is with the AMBER Bugfixes - see here:
> >> http://ambermd.org/bugfixes11.html - Make sure you get AMBERTools 1.5
> and
> >> patch that as well.
> >>
> >> I would also note that:
> >>
> >> #heating in 6ns at 500K without restraint on the model to unfold the
> >> protein
> >>
> >> &cntrl
> >>
> >> imin=0,
> >>
> >> iwrap=0,
> >>
> >> irest=0,
> >>
> >> ntx=1,
> >>
> >> ntb=1,
> >>
> >> cut=10.0,
> >>
> >> ntr=0,
> >>
> >> ntc=2,
> >>
> >> ntf=2,
> >>
> >> tempi=500.0,
> >>
> >> temp0=500.0,
> >>
> >> ntt=3,
> >>
> >> gamma_ln=1.0,
> >>
> >> nstlim=3000000, dt=0.002,
> >>
> >> ntpr=1500, ntwx=1500,ntwr=1000
> >>
> >> /
> >>
> >> You are running at 500K but with a 2fs time step. You probably need to
> >> reduce the time step to 1.5fs or so to run at such an elevated
> >> temperature.
> >>
> >> All the best
> >> Ross
> >>
> >>
> >> > -----Original Message-----
> >> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> >> > Sent: Tuesday, December 13, 2011 8:04 PM
> >> > To: AMBER Mailing List
> >> > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >> >
> >> > Dear Dr. Walker,
> >> >
> >> > I sent you the files and the info to your gmail. Thank you.
> >> >
> >> > Chinh
> >> >
> >> > On Tue, Dec 13, 2011 at 12:09 PM, Ross Walker <ross.rosswalker.co.uk>
> >> > wrote:
> >> >
> >> > > Hi Chinh,
> >> > >
> >> > > Can you send me (offlist) all of your input files please along with
> >> > details
> >> > > of your computer system. OS, NVIDIA compiler and driver version,
> >> > hardware
> >> > > spec (especially the GPU version). I need to be able to replicate
> >> > this in
> >> > > order to investigate what is going wrong.
> >> > >
> >> > > Please also include the output from the run that gave what looked
> >> > like a
> >> > > disk error.
> >> > >
> >> > > Thank you.
> >> > >
> >> > > All the best
> >> > > Ross
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> >> > > > Sent: Monday, December 12, 2011 8:04 PM
> >> > > > To: AMBER Mailing List
> >> > > > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >> > > >
> >> > > > Dear Dr. Walker,
> >> > > >
> >> > > > As you suggested, we ran the check for the hard disk, but both
> were
> >> > > > clean!
> >> > > > I tried to run the same code using pmemd only, and it was fine,
> >> > i.e. no
> >> > > > crash, no error.
> >> > > >
> >> > > > But then I returned using pmemd.cuda, it happened again (crashed).
> >> > > >
> >> > > > There were also 2 problems that I noticed when I was using
> >> > pmemd.cuda:
> >> > > >
> >> > > > 1. When I used *iwrap=0* (in the input file as below), it showed
> >> > > > "segmentation
> >> > > > fault" immediately. I knew that it was an old error I encountered
> >> > (but
> >> > > > I
> >> > > > wanted to try it to detect the pmemd.cuda only).
> >> > > > 2. Then I switched it* iwrap=1* with some modification in the
> >> > > > *gpu.cpp*(the solution that I found in the AMBER forum), it
> >> > crashed. (
> >> > > > *However, it also crashed before these modifications)*
> >> > > >
> >> > > > Please help. We did not know what was wrong.
> >> > > >
> >> > > > The input is:
> >> > > >
> >> > > > &cntrl
> >> > > > imin=0,
> >> > > > * iwrap=1 => I also tried iwrap=0*
> >> > > > irest=0,
> >> > > > ntx=1,
> >> > > > ntb=1,
> >> > > > cut=10.0,
> >> > > > ntr=0,
> >> > > > ntc=2,
> >> > > > ntf=2,
> >> > > > tempi=500.0,
> >> > > > temp0=500.0,
> >> > > > ntt=3,
> >> > > > gamma_ln=1.0,
> >> > > > nstlim=3000000, dt=0.002,
> >> > > > ntpr=1500, ntwx=1500,ntwr=1000
> >> > > > /
> >> > > >
> >> > > >
> >> > > > Thank you.
> >> > > > Chinsu
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <
> ross.rosswalker.co.uk>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Chinsu,
> >> > > > >
> >> > > > > This looks like a hard drive failure to me (or pending hard
> drive
> >> > > > failure).
> >> > > > > Please try things with the CPU version of the code and see what
> >> > > > happens. I
> >> > > > > can't see how this could be generated by the GPU code. You might
> >> > want
> >> > > > to
> >> > > > > try
> >> > > > > booting the machine in single user (or recovery mode) and
> running
> >> > an
> >> > > > fsck
> >> > > > > on
> >> > > > > the file system. You could also try running a smartctl check on
> >> > the
> >> > > > hard
> >> > > > > drive to see what it's diagnostics are reporting.
> >> > > > >
> >> > > > > All the best
> >> > > > > Ross
> >> > > > >
> >> > > > > > -----Original Message-----
> >> > > > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> >> > > > > > Sent: Monday, December 05, 2011 7:33 PM
> >> > > > > > To: AMBER Mailing List
> >> > > > > > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >> > > > > >
> >> > > > > > Dear AMBER users,
> >> > > > > >
> >> > > > > > I was running pmemd.cuda using Amber11 on a GPU which is
> >> > installed
> >> > > > in a
> >> > > > > > workstation.
> >> > > > > > The process was of 2 steps of short minimizations and a 6ns of
> >> > > > heating
> >> > > > > > the
> >> > > > > > protein (270 residues) at 500K.
> >> > > > > >
> >> > > > > > The minimizations were fine, but when i ran the heating, my
> >> > system
> >> > > > > > "crashed". The errors are as below:
> >> > > > > >
> >> > > > > > *[2932683.628873] EXT4-fs (sda1): previous I/O error to
> >> > superblock
> >> > > > > > detected*
> >> > > > > > *
> >> > > > > > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> >> > > > > > inode#9699494:
> >> > > > > > comm init: reading directory lblock 0**
> >> > > > > > *
> >> > > > > > *
> >> > > > > > *
> >> > > > > > I re-booted the system, and tried to run it again. The same
> >> > errors
> >> > > > > > came.
> >> > > > > > My input file is:
> >> > > > > >
> >> > > > > > &cntrl
> >> > > > > > imin=0,
> >> > > > > > * iwrap=1 => I also tried iwrap=0*
> >> > > > > > irest=0,
> >> > > > > > ntx=1,
> >> > > > > > ntb=1,
> >> > > > > > cut=10.0,
> >> > > > > > ntr=0,
> >> > > > > > ntc=2,
> >> > > > > > ntf=2,
> >> > > > > > tempi=500.0,
> >> > > > > > temp0=500.0,
> >> > > > > > ntt=3,
> >> > > > > > gamma_ln=1.0,
> >> > > > > > nstlim=3000000, dt=0.002,
> >> > > > > > ntpr=1500, ntwx=1500,ntwr=1000
> >> > > > > > /
> >> > > > > > *
> >> > > > > > *
> >> > > > > > *
> >> > > > > > *
> >> > > > > > We tried to find out what was going on, but we did not know
> >> > where
> >> > > > the
> >> > > > > > crash
> >> > > > > > came from?
> >> > > > > > Please help.
> >> > > > > >
> >> > > > > > Thank you.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Chinsu
> >> > > > > > _______________________________________________
> >> > > > > > AMBER mailing list
> >> > > > > > AMBER.ambermd.org
> >> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
> >> > > > >
> >> > > > >
> >> > > > > _______________________________________________
> >> > > > > AMBER mailing list
> >> > > > > AMBER.ambermd.org
> >> > > > > http://lists.ambermd.org/mailman/listinfo/amber
> >> > > > >
> >> > > > _______________________________________________
> >> > > > AMBER mailing list
> >> > > > AMBER.ambermd.org
> >> > > > http://lists.ambermd.org/mailman/listinfo/amber
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > AMBER mailing list
> >> > > AMBER.ambermd.org
> >> > > http://lists.ambermd.org/mailman/listinfo/amber
> >> > >
> >> > _______________________________________________
> >> > AMBER mailing list
> >> > AMBER.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 11 2012 - 09:00:03 PST
Custom Search