Re: [AMBER] AMBER11, pmemd.cuda: the system crashed. from Chinh Su Tran To on 2011-12-05 (Amber Archive Dec 2011)

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Tue, 6 Dec 2011 13:18:38 +0800

Dear Dr. Walker,

I also used the same input files for sander which run on the server
(4nodes, 8ppn), but there have been no problems.
Because the errors hang the system, so I did a force reboot. It was
successful, but again crashed when i repeated the job.
I will do some checks as you suggested, and let you know later.

Thank you.
Chinsu

On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Chinsu,
>
> This looks like a hard drive failure to me (or pending hard drive failure).
> Please try things with the CPU version of the code and see what happens. I
> can't see how this could be generated by the GPU code. You might want to
> try
> booting the machine in single user (or recovery mode) and running an fsck
> on
> the file system. You could also try running a smartctl check on the hard
> drive to see what it's diagnostics are reporting.
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
> > Sent: Monday, December 05, 2011 7:33 PM
> > To: AMBER Mailing List
> > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
> >
> > Dear AMBER users,
> >
> > I was running pmemd.cuda using Amber11 on a GPU which is installed in a
> > workstation.
> > The process was of 2 steps of short minimizations and a 6ns of heating
> > the
> > protein (270 residues) at 500K.
> >
> > The minimizations were fine, but when i ran the heating, my system
> > "crashed". The errors are as below:
> >
> > *[2932683.628873] EXT4-fs (sda1): previous I/O error to superblock
> > detected*
> > *
> > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
> > inode#9699494:
> > comm init: reading directory lblock 0**
> > *
> > *
> > *
> > I re-booted the system, and tried to run it again. The same errors
> > came.
> > My input file is:
> >
> > &cntrl
> > imin=0,
> > * iwrap=1 => I also tried iwrap=0*
> > irest=0,
> > ntx=1,
> > ntb=1,
> > cut=10.0,
> > ntr=0,
> > ntc=2,
> > ntf=2,
> > tempi=500.0,
> > temp0=500.0,
> > ntt=3,
> > gamma_ln=1.0,
> > nstlim=3000000, dt=0.002,
> > ntpr=1500, ntwx=1500,ntwr=1000
> > /
> > *
> > *
> > *
> > *
> > We tried to find out what was going on, but we did not know where the
> > crash
> > came from?
> > Please help.
> >
> > Thank you.
> >
> > Regards,
> > Chinsu
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Dec 05 2011 - 21:30:04 PST