[AMBER] Anomalous Termination of PMEMD.CUDA jobs from kurisaki on 2013-02-13 (Amber Archive Feb 2013)

From: kurisaki <kurisaki.ncube.human.nagoya-u.ac.jp>
Date: Wed, 13 Feb 2013 19:02:23 +0900

Dear Amber developers and users,

Thank you for kind support.

I have been in trouble for anomalous termination of PMEMD.CUDA
when I use Amber12 with GTX680 at SFDP level in my machine.

Although an MD job normally runs for several hours,
I often encounter anomalous termination of MD jobs,
Where "segmentation fault" occurs.

Curiously, such an anomalous termination never happens
for another GPU machine (this is completely same in terms of Machine spec
as the previous one).

I am grad if you have similar experience and
Tell me how to overtake this situation.

Sincerely, yours.

Ikuo KURISAKI

PS. I attached the messages saved in /var/log/messages for a reference.
Is this a system problem, e.g. s

Feb 12 11:37:02 gps102 kernel: imklog 4.6.2, log source = /proc/kmsg started.
Feb 12 11:37:02 gps102 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2"
x-pid="2051" x-info="http://www.rsyslog.com"] (re)start
Feb 12 12:28:30 gps102 kernel: pmemd.cuda[32307]: segfault at 2e3000002eb7 ip
00007f2939843248 sp 00007ffff927ef40 error 4 in
libgfortran.so.3.0.0[7f2939784000+f0000]
Feb 12 12:28:31 gps102 abrt[32309]: saved core dump of pid 32307
(/home/kurisaki/amber/amber12gpu/amber12/bin/pmemd.cuda_SPFP) to
/var/spool/abrt/ccpp-2013-02-12-12:28:30-32307.new/coredump (84373504 bytes)
Feb 12 12:28:31 gps102 abrtd: Directory 'ccpp-2013-02-12-12:28:30-32307'
creation detected
Feb 12 12:28:31 gps102 abrtd: Executable
'/home/kurisaki/amber/amber12gpu/amber12/bin/pmemd.cuda_SPFP' doesn't belong to
any package
Feb 12 12:28:31 gps102 abrtd: Corrupted or bad dump
/var/spool/abrt/ccpp-2013-02-12-12:28:30-32307 (res:2), deleting

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Feb 13 2013 - 02:30:02 PST