Hi Ross,
I sent a QMMM test job via PBS to 4 CPUs of 4 different nodes. The test crashed at step qmmm2/1NLN_test_diagonalizers && ./Run.1NLN_dspev.
I then carried out a set of parallel QMMM test runs on the server as follows
cd $AMBERHOME/test
sh> export DO_PARALLEL="mpirun -np 2/4"
lamboot
make -f Makefile test.parallel.QMMM>&out&
lamhalt
using 2 and 4 CPUs. Both tests failed at the dspev step as follows:
cd qmmm2/1NLN_test_diagonalizers && ./Run.1NLN_dspev
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 9899 failed on node n0 (127.0.0.1) due to signal 6.
In addition, I ran two QMMM jobs on the server with "verbosity=4" option as follows
cd $AMBERHOME/test/qmmm2/1NLN_test_diagonalizers
mpirun -np 4 sander.MPI -i 1NLN_dspev.mdin -o 1NLN_dspev.4cpus.mdout -p 1NLN_15A_solv.prmtop -c 1NLN_15A_solv_min.rst
mpirun -np 2 sander.MPI -i 1NLN_dspev.mdin -o 1NLN_dspev.2cpus.mdout -p 1NLN_15A_solv.prmtop -c 1NLN_15A_solv_min.rst
These output files are attached.
It appears that the manual run of 1NLN_dspev with 4 CPUs on the server crashes, while the one with 2 CPUs goes to a completion.
cheers,
jenk.
--- On Thu, 5/22/08, Ross Walker <ross.rosswalker.co.uk> wrote:
> From: Ross Walker <ross.rosswalker.co.uk>
> Subject: RE: Fw: RE: AMBER: MKL libraries/Amber10
> To: amber.scripps.edu
> Date: Thursday, May 22, 2008, 5:40 PM
> Hi Cenk,
>
> Can you possibly try something for me.
>
> Can you run the parallel test cases on both one of the
> nodes in parallel and
> on the login node in parallel (if it will let you).
> Specifically I want to
> find out if the problem is processor specific. I only have
> access to Intel
> chips at the moment and everything works fine. It is
> possible that MKL's
> dspev crashes on AMD chips (I wouldn't put it past
> Intel ;-)).
>
> Thus I am guessing that this may be a problem specific to
> AMD chips. I'll
> try and get on Ranger to try this myself but it has been
> down for the last
> few days.
>
> All the best
> Ross
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the
> *body* of the email)
> to majordomo.scripps.edu
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Sun May 25 2008 - 06:07:52 PDT