Re: [AMBER] AMBER11, pmemd.cuda: the system crashed. from Chinh Su Tran To on 2011-12-12 (Amber Archive Dec 2011)

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Tue, 13 Dec 2011 12:04:10 +0800

Dear Dr. Walker,

As you suggested, we ran the check for the hard disk, but both were clean!
I tried to run the same code using pmemd only, and it was fine, i.e. no
crash, no error.

But then I returned using pmemd.cuda, it happened again (crashed).

There were also 2 problems that I noticed when I was using pmemd.cuda:

1. When I used *iwrap=0* (in the input file as below), it showed "segmentation
fault" immediately. I knew that it was an old error I encountered (but I
wanted to try it to detect the pmemd.cuda only).
2. Then I switched it* iwrap=1* with some modification in the
*gpu.cpp*(the solution that I found in the AMBER forum), it crashed. (
*However, it also crashed before these modifications)*

Please help. We did not know what was wrong.

The input is:

&cntrl
  imin=0,
* iwrap=1 => I also tried iwrap=0*
  irest=0,
  ntx=1,
  ntb=1,
  cut=10.0,
  ntr=0,
  ntc=2,
  ntf=2,
  tempi=500.0,
  temp0=500.0,
  ntt=3,
  gamma_ln=1.0,
  nstlim=3000000, dt=0.002,
  ntpr=1500, ntwx=1500,ntwr=1000
/

Thank you.
Chinsu

On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Chinsu,
>
> This looks like a hard drive failure to me (or pending hard drive failure).
> Please try things with the CPU version of the code and see what happens. I
> can't see how this could be generated by the GPU code. You might want to
> try
> booting the machine in single user (or recovery mode) and running an fsck
> on
> the file system. You could also try running a smartctl check on the hard
> drive to see what it's diagnostics are reporting.
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > Sent: Monday, December 05, 2011 7:33 PM
> > To: AMBER Mailing List
> > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >
> > Dear AMBER users,
> >
> > I was running pmemd.cuda using Amber11 on a GPU which is installed in a
> > workstation.
> > The process was of 2 steps of short minimizations and a 6ns of heating
> > the
> > protein (270 residues) at 500K.
> >
> > The minimizations were fine, but when i ran the heating, my system
> > "crashed". The errors are as below:
> >
> > *[2932683.628873] EXT4-fs (sda1): previous I/O error to superblock
> > detected*
> > *
> > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> > inode#9699494:
> > comm init: reading directory lblock 0**
> > *
> > *
> > *
> > I re-booted the system, and tried to run it again. The same errors
> > came.
> > My input file is:
> >
> > &cntrl
> > imin=0,
> > * iwrap=1 => I also tried iwrap=0*
> > irest=0,
> > ntx=1,
> > ntb=1,
> > cut=10.0,
> > ntr=0,
> > ntc=2,
> > ntf=2,
> > tempi=500.0,
> > temp0=500.0,
> > ntt=3,
> > gamma_ln=1.0,
> > nstlim=3000000, dt=0.002,
> > ntpr=1500, ntwx=1500,ntwr=1000
> > /
> > *
> > *
> > *
> > *
> > We tried to find out what was going on, but we did not know where the
> > crash
> > came from?
> > Please help.
> >
> > Thank you.
> >
> > Regards,
> > Chinsu
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 12 2011 - 20:30:04 PST