Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Wed, 14 Dec 2011 12:03:50 +0800

Dear Dr. Walker,

I sent you the files and the info to your gmail. Thank you.

Chinh

On Tue, Dec 13, 2011 at 12:09 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Chinh,
>
> Can you send me (offlist) all of your input files please along with details
> of your computer system. OS, NVIDIA compiler and driver version, hardware
> spec (especially the GPU version). I need to be able to replicate this in
> order to investigate what is going wrong.
>
> Please also include the output from the run that gave what looked like a
> disk error.
>
> Thank you.
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > Sent: Monday, December 12, 2011 8:04 PM
> > To: AMBER Mailing List
> > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >
> > Dear Dr. Walker,
> >
> > As you suggested, we ran the check for the hard disk, but both were
> > clean!
> > I tried to run the same code using pmemd only, and it was fine, i.e. no
> > crash, no error.
> >
> > But then I returned using pmemd.cuda, it happened again (crashed).
> >
> > There were also 2 problems that I noticed when I was using pmemd.cuda:
> >
> > 1. When I used *iwrap=0* (in the input file as below), it showed
> > "segmentation
> > fault" immediately. I knew that it was an old error I encountered (but
> > I
> > wanted to try it to detect the pmemd.cuda only).
> > 2. Then I switched it* iwrap=1* with some modification in the
> > *gpu.cpp*(the solution that I found in the AMBER forum), it crashed. (
> > *However, it also crashed before these modifications)*
> >
> > Please help. We did not know what was wrong.
> >
> > The input is:
> >
> > &cntrl
> > imin=0,
> > * iwrap=1 => I also tried iwrap=0*
> > irest=0,
> > ntx=1,
> > ntb=1,
> > cut=10.0,
> > ntr=0,
> > ntc=2,
> > ntf=2,
> > tempi=500.0,
> > temp0=500.0,
> > ntt=3,
> > gamma_ln=1.0,
> > nstlim=3000000, dt=0.002,
> > ntpr=1500, ntwx=1500,ntwr=1000
> > /
> >
> >
> > Thank you.
> > Chinsu
> >
> >
> >
> >
> > On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <ross.rosswalker.co.uk>
> > wrote:
> >
> > > Hi Chinsu,
> > >
> > > This looks like a hard drive failure to me (or pending hard drive
> > failure).
> > > Please try things with the CPU version of the code and see what
> > happens. I
> > > can't see how this could be generated by the GPU code. You might want
> > to
> > > try
> > > booting the machine in single user (or recovery mode) and running an
> > fsck
> > > on
> > > the file system. You could also try running a smartctl check on the
> > hard
> > > drive to see what it's diagnostics are reporting.
> > >
> > > All the best
> > > Ross
> > >
> > > > -----Original Message-----
> > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > > > Sent: Monday, December 05, 2011 7:33 PM
> > > > To: AMBER Mailing List
> > > > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> > > >
> > > > Dear AMBER users,
> > > >
> > > > I was running pmemd.cuda using Amber11 on a GPU which is installed
> > in a
> > > > workstation.
> > > > The process was of 2 steps of short minimizations and a 6ns of
> > heating
> > > > the
> > > > protein (270 residues) at 500K.
> > > >
> > > > The minimizations were fine, but when i ran the heating, my system
> > > > "crashed". The errors are as below:
> > > >
> > > > *[2932683.628873] EXT4-fs (sda1): previous I/O error to superblock
> > > > detected*
> > > > *
> > > > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> > > > inode#9699494:
> > > > comm init: reading directory lblock 0**
> > > > *
> > > > *
> > > > *
> > > > I re-booted the system, and tried to run it again. The same errors
> > > > came.
> > > > My input file is:
> > > >
> > > > &cntrl
> > > > imin=0,
> > > > * iwrap=1 => I also tried iwrap=0*
> > > > irest=0,
> > > > ntx=1,
> > > > ntb=1,
> > > > cut=10.0,
> > > > ntr=0,
> > > > ntc=2,
> > > > ntf=2,
> > > > tempi=500.0,
> > > > temp0=500.0,
> > > > ntt=3,
> > > > gamma_ln=1.0,
> > > > nstlim=3000000, dt=0.002,
> > > > ntpr=1500, ntwx=1500,ntwr=1000
> > > > /
> > > > *
> > > > *
> > > > *
> > > > *
> > > > We tried to find out what was going on, but we did not know where
> > the
> > > > crash
> > > > came from?
> > > > Please help.
> > > >
> > > > Thank you.
> > > >
> > > > Regards,
> > > > Chinsu
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Dec 13 2011 - 20:30:03 PST
Custom Search