Re: [AMBER] AMBER11, pmemd.cuda: the system crashed. from Chinh Su Tran To on 2012-01-11 (Amber Archive Jan 2012)

From: Chinh Su Tran To <chinh.sutranto.gmail.com>
Date: Wed, 11 Jan 2012 16:04:48 +0800

Dear Dr. Walker,

As you suggested, we updated and patched all the bugfix for both the
Amber11 and AmberTool1.5. However, the crash still came.

Firstly, it ran fine for the first stage (6 ns), then crashed, but the
result was generated. I re-booted it, then ran the second stage (the next 6
ns) using the previous results. It also gave the results, then crashed.
When I switched the time-step to 1.5 fs, it crashed in a few minutes after
the job was submitted.

We doubt that there might be some problem with the compatibility of the
system. We are using a workstation of DELL system, OS ubuntu 11.04
(GNU/Linux 2.6.38-8 server x86_64), GPU GeForce GTX 580 (CUDA
version/runtime version 4.0/4.0), nvcc NVIDIA Cuda compiler driver.

Could you please let us know if there is any "special" requirements for the
computer system that needs to be compatible with the GPU card?

Thank you, and looking forward to your help.

Regards,
Chinh

On Thu, Dec 15, 2011 at 11:19 AM, Chinh Su Tran To <chinh.sutranto.gmail.com
> wrote:

> Dear Dr. Walker,
>
> Thank you so much.
>
> Regards,
> Chinh
>
>
> On Thu, Dec 15, 2011 at 6:06 AM, Ross Walker <ross.rosswalker.co.uk>wrote:
>
>> Hi Chinh,
>>
>> This works fine on my system with an up to date patched version of AMBER
>> 11.
>> Looking at the output you sent me it looks like you are running with an
>> unpatched version of AMBER 11. Your output has:
>>
>> |--------------------- INFORMATION ----------------------
>> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>> |
>> | Implementation by:
>> | Ross C. Walker (SDSC)
>> | Scott Le Grand (nVIDIA)
>> | Duncan Poole (nVIDIA)
>> |
>> | CAUTION: The CUDA code is currently experimental.
>> | You use it at your own risk. Be sure to
>> | check ALL results carefully.
>> |
>> | Precision model in use:
>> | [SPDP] - Hybrid Single/Double Precision (Default).
>> |
>> |--------------------------------------------------------
>>
>> While it should have:
>>
>> |--------------------- INFORMATION ----------------------
>> | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
>> | Version 2.2
>> |
>> | 08/16/2011
>> |
>> |
>> | Implementation by:
>> | Ross C. Walker (SDSC)
>> | Scott Le Grand (nVIDIA)
>> | Duncan Poole (nVIDIA)
>> |
>> | CAUTION: The CUDA code is currently experimental.
>> | You use it at your own risk. Be sure to
>> | check ALL results carefully.
>> |
>> | Precision model in use:
>> | [SPDP] - Hybrid Single/Double Precision (Default).
>> |
>> |--------------------------------------------------------
>>
>> Note the version number. Hence you are running with an out of date AMBER
>> 11
>> and this is almost certainly leading to your issues. Start from a
>> completely
>> clean Amber 11 directory created by untarring the original Amber11.tar.bz2
>> file and patch is with the AMBER Bugfixes - see here:
>> http://ambermd.org/bugfixes11.html - Make sure you get AMBERTools 1.5 and
>> patch that as well.
>>
>> I would also note that:
>>
>> #heating in 6ns at 500K without restraint on the model to unfold the
>> protein
>>
>> &cntrl
>>
>> imin=0,
>>
>> iwrap=0,
>>
>> irest=0,
>>
>> ntx=1,
>>
>> ntb=1,
>>
>> cut=10.0,
>>
>> ntr=0,
>>
>> ntc=2,
>>
>> ntf=2,
>>
>> tempi=500.0,
>>
>> temp0=500.0,
>>
>> ntt=3,
>>
>> gamma_ln=1.0,
>>
>> nstlim=3000000, dt=0.002,
>>
>> ntpr=1500, ntwx=1500,ntwr=1000
>>
>> /
>>
>> You are running at 500K but with a 2fs time step. You probably need to
>> reduce the time step to 1.5fs or so to run at such an elevated
>> temperature.
>>
>> All the best
>> Ross
>>
>>
>> > -----Original Message-----
>> > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
>> > Sent: Tuesday, December 13, 2011 8:04 PM
>> > To: AMBER Mailing List
>> > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
>> >
>> > Dear Dr. Walker,
>> >
>> > I sent you the files and the info to your gmail. Thank you.
>> >
>> > Chinh
>> >
>> > On Tue, Dec 13, 2011 at 12:09 PM, Ross Walker <ross.rosswalker.co.uk>
>> > wrote:
>> >
>> > > Hi Chinh,
>> > >
>> > > Can you send me (offlist) all of your input files please along with
>> > details
>> > > of your computer system. OS, NVIDIA compiler and driver version,
>> > hardware
>> > > spec (especially the GPU version). I need to be able to replicate
>> > this in
>> > > order to investigate what is going wrong.
>> > >
>> > > Please also include the output from the run that gave what looked
>> > like a
>> > > disk error.
>> > >
>> > > Thank you.
>> > >
>> > > All the best
>> > > Ross
>> > >
>> > > > -----Original Message-----
>> > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
>> > > > Sent: Monday, December 12, 2011 8:04 PM
>> > > > To: AMBER Mailing List
>> > > > Subject: Re: [AMBER] AMBER11, pmemd.cuda: the system crashed.
>> > > >
>> > > > Dear Dr. Walker,
>> > > >
>> > > > As you suggested, we ran the check for the hard disk, but both were
>> > > > clean!
>> > > > I tried to run the same code using pmemd only, and it was fine,
>> > i.e. no
>> > > > crash, no error.
>> > > >
>> > > > But then I returned using pmemd.cuda, it happened again (crashed).
>> > > >
>> > > > There were also 2 problems that I noticed when I was using
>> > pmemd.cuda:
>> > > >
>> > > > 1. When I used *iwrap=0* (in the input file as below), it showed
>> > > > "segmentation
>> > > > fault" immediately. I knew that it was an old error I encountered
>> > (but
>> > > > I
>> > > > wanted to try it to detect the pmemd.cuda only).
>> > > > 2. Then I switched it* iwrap=1* with some modification in the
>> > > > *gpu.cpp*(the solution that I found in the AMBER forum), it
>> > crashed. (
>> > > > *However, it also crashed before these modifications)*
>> > > >
>> > > > Please help. We did not know what was wrong.
>> > > >
>> > > > The input is:
>> > > >
>> > > > &cntrl
>> > > > imin=0,
>> > > > * iwrap=1 => I also tried iwrap=0*
>> > > > irest=0,
>> > > > ntx=1,
>> > > > ntb=1,
>> > > > cut=10.0,
>> > > > ntr=0,
>> > > > ntc=2,
>> > > > ntf=2,
>> > > > tempi=500.0,
>> > > > temp0=500.0,
>> > > > ntt=3,
>> > > > gamma_ln=1.0,
>> > > > nstlim=3000000, dt=0.002,
>> > > > ntpr=1500, ntwx=1500,ntwr=1000
>> > > > /
>> > > >
>> > > >
>> > > > Thank you.
>> > > > Chinsu
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Tue, Dec 6, 2011 at 1:01 PM, Ross Walker <ross.rosswalker.co.uk>
>> > > > wrote:
>> > > >
>> > > > > Hi Chinsu,
>> > > > >
>> > > > > This looks like a hard drive failure to me (or pending hard drive
>> > > > failure).
>> > > > > Please try things with the CPU version of the code and see what
>> > > > happens. I
>> > > > > can't see how this could be generated by the GPU code. You might
>> > want
>> > > > to
>> > > > > try
>> > > > > booting the machine in single user (or recovery mode) and running
>> > an
>> > > > fsck
>> > > > > on
>> > > > > the file system. You could also try running a smartctl check on
>> > the
>> > > > hard
>> > > > > drive to see what it's diagnostics are reporting.
>> > > > >
>> > > > > All the best
>> > > > > Ross
>> > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Chinh Su Tran To [mailto:chinh.sutranto.gmail.com]
>> > > > > > Sent: Monday, December 05, 2011 7:33 PM
>> > > > > > To: AMBER Mailing List
>> > > > > > Subject: [AMBER] AMBER11, pmemd.cuda: the system crashed.
>> > > > > >
>> > > > > > Dear AMBER users,
>> > > > > >
>> > > > > > I was running pmemd.cuda using Amber11 on a GPU which is
>> > installed
>> > > > in a
>> > > > > > workstation.
>> > > > > > The process was of 2 steps of short minimizations and a 6ns of
>> > > > heating
>> > > > > > the
>> > > > > > protein (270 residues) at 500K.
>> > > > > >
>> > > > > > The minimizations were fine, but when i ran the heating, my
>> > system
>> > > > > > "crashed". The errors are as below:
>> > > > > >
>> > > > > > *[2932683.628873] EXT4-fs (sda1): previous I/O error to
>> > superblock
>> > > > > > detected*
>> > > > > > *
>> > > > > > [2932683.646789] EXT4-fs (device sda1): ext4_find_entry:933:
>> > > > > > inode#9699494:
>> > > > > > comm init: reading directory lblock 0**
>> > > > > > *
>> > > > > > *
>> > > > > > *
>> > > > > > I re-booted the system, and tried to run it again. The same
>> > errors
>> > > > > > came.
>> > > > > > My input file is:
>> > > > > >
>> > > > > > &cntrl
>> > > > > > imin=0,
>> > > > > > * iwrap=1 => I also tried iwrap=0*
>> > > > > > irest=0,
>> > > > > > ntx=1,
>> > > > > > ntb=1,
>> > > > > > cut=10.0,
>> > > > > > ntr=0,
>> > > > > > ntc=2,
>> > > > > > ntf=2,
>> > > > > > tempi=500.0,
>> > > > > > temp0=500.0,
>> > > > > > ntt=3,
>> > > > > > gamma_ln=1.0,
>> > > > > > nstlim=3000000, dt=0.002,
>> > > > > > ntpr=1500, ntwx=1500,ntwr=1000
>> > > > > > /
>> > > > > > *
>> > > > > > *
>> > > > > > *
>> > > > > > *
>> > > > > > We tried to find out what was going on, but we did not know
>> > where
>> > > > the
>> > > > > > crash
>> > > > > > came from?
>> > > > > > Please help.
>> > > > > >
>> > > > > > Thank you.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Chinsu
>> > > > > > _______________________________________________
>> > > > > > AMBER mailing list
>> > > > > > AMBER.ambermd.org
>> > > > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > >
>> > > > >
>> > > > > _______________________________________________
>> > > > > AMBER mailing list
>> > > > > AMBER.ambermd.org
>> > > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > > >
>> > > > _______________________________________________
>> > > > AMBER mailing list
>> > > > AMBER.ambermd.org
>> > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > >
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 11 2012 - 00:30:02 PST