[AMBER] pmemd.cuda.MPI error - integer divide by zero

From: Bill Miller III <brmilleriii.gmail.com>
Date: Fri, 10 Jul 2015 17:36:38 -0400


I am trying to get pmemd.cuda.MPI to run on two GTX-980s in parallel on a
workstation running RedHat 6.6. I have re-compiled with an updated Amber14
(released with patches as of today, not developers tree) using openmpi
1.6.5 and gnu (gcc/gfortran v. 4.4.7-11). Whenever I try to run a MD
simulation in parallel, I get the following error messages immediately. I
tried googling for several of the messages, but nothing seemed appropriate
for my particular situation. Any ideas?

mpirun -np 2 $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i md.mdin -c
dna_mod3_solv_md003.rst7 -p dna_mod3_solv.prmtop

error messages printed to screen:
[mayer:05251] *** Process received signal ***
[mayer:05251] Signal: Floating point exception (8)
[mayer:05251] Signal code: Integer divide-by-zero (1)
[mayer:05251] Failing at address: 0x45b935
[mayer:05251] [ 0] /lib64/libpthread.so.0() [0x3823c0f710]
[mayer:05251] [ 1]
[mayer:05251] [ 2]
[mayer:05251] [ 3]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(MAIN__+0xb0) [0x5110d0]
[mayer:05251] [ 4]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(main+0x2a) [0x66eb0a]
[mayer:05251] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x382381ed5d]
[mayer:05251] [ 6] /usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI()
[mayer:05251] *** End of error message ***
mpirun noticed that process rank 0 with PID 5251 on node
mayer.richmond.edu exited
on signal 8 (Floating point exception).

Mayer is the name of the workstation.

The same set of Amber files work successfully in serial on the same
computer, just not in parallel. So I am assuming there is something I did
incorrectly with the installation or maybe I am just not linking to the
proper library files? I don't know what to do at this point.

If you have any suggestions or insights, please let me know.



Bill Miller III
University of Richmond
AMBER mailing list
Received on Fri Jul 10 2015 - 15:00:08 PDT
Custom Search