[AMBER] pmemd.cuda.MPI error - integer divide by zero from Bill Miller III on 2015-07-10 (Amber Archive Jul 2015)

From: Bill Miller III <brmilleriii.gmail.com>
Date: Fri, 10 Jul 2015 17:36:38 -0400

Hi,

I am trying to get pmemd.cuda.MPI to run on two GTX-980s in parallel on a
workstation running RedHat 6.6. I have re-compiled with an updated Amber14
(released with patches as of today, not developers tree) using openmpi
1.6.5 and gnu (gcc/gfortran v. 4.4.7-11). Whenever I try to run a MD
simulation in parallel, I get the following error messages immediately. I
tried googling for several of the messages, but nothing seemed appropriate
for my particular situation. Any ideas?

command:
mpirun -np 2 $AMBERHOME/bin/pmemd.cuda_SPFP.MPI -O -i md.mdin -c
dna_mod3_solv_md003.rst7 -p dna_mod3_solv.prmtop

error messages printed to screen:
[mayer:05251] *** Process received signal ***
[mayer:05251] Signal: Floating point exception (8)
[mayer:05251] Signal code: Integer divide-by-zero (1)
[mayer:05251] Failing at address: 0x45b935
[mayer:05251] [ 0] /lib64/libpthread.so.0() [0x3823c0f710]
[mayer:05251] [ 1]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(__mdin_ewald_dat_mod_MOD_init_mdin_ewald_dat+0xb45)
[0x45b935]
[mayer:05251] [ 2]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(__master_setup_mod_MOD_master_setup+0xb7f)
[0x5315ff]
[mayer:05251] [ 3]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(MAIN__+0xb0) [0x5110d0]
[mayer:05251] [ 4]
/usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI(main+0x2a) [0x66eb0a]
[mayer:05251] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x382381ed5d]
[mayer:05251] [ 6] /usr/local/amber/amber14/bin/pmemd.cuda_SPFP.MPI()
[0x4482d9]
[mayer:05251] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 5251 on node
mayer.richmond.edu exited
on signal 8 (Floating point exception).

Mayer is the name of the workstation.

The same set of Amber files work successfully in serial on the same
computer, just not in parallel. So I am assuming there is something I did
incorrectly with the installation or maybe I am just not linking to the
proper library files? I don't know what to do at this point.

If you have any suggestions or insights, please let me know.

Thanks.

-Bill

-- 
Bill Miller III
Post-doc
University of Richmond
417-549-0952
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jul 10 2015 - 15:00:08 PDT