Marc Cozzi wrote:
> rank 0 in job 12 ndrl.secure.net_46053 caused collective abort of all
> ranks
> exit status of rank 0: return code 174
Does it fail on the same test every time? What is the topology of your
computer setup?
Off the top of my head, the SIGSEGV signal *could* be indicative of a
memory hardware problem. If this is failing at random points, I'd
suggest burning an image of memtest86+ (
http://www.memtest.org/ ) to a
CD and then booting the machine in question from that CD. This will run
a series of memory tests on the machine and should give an indication if
there is a memory problem.
regards,
Mark
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Feb 03 2008 - 06:07:09 PST