I am trying to run a benchmark called `rt` using Amber for 8 processors,
where it sorts of goes into an infinite loop. However it runs successfully
to completion for less than 8 processors. This is for sander runs.
I used 64-bit Linux 2.6.13-rc7, on an 8 way NUMA machine with
mpich-1.2.7compiled with
gcc-3.2.3 & ifort-8.1.025 on x86_64 architecture.
The `top` output shows all the 8 threads actively running.
The stack trace shows the following :
#0 0x0000003f04aae259 in sched_yield () from /lib64/tls/libc.so.6
#1 0x00007fffff911ba0 in ?? ()
#2 0x000000000054b1a0 in MPID_SHMEM_ReadControl ()
#3 0x000000000054b1a0 in MPID_SHMEM_ReadControl ()
#4 0x0000000000545827 in MPID_SHMEM_Check_incoming ()
#5 0x000000000053cc67 in MPID_DeviceCheck ()
#6 0x000000000054a6c1 in MPID_SHMEM_Rndvn_send_wait_ack ()
#7 0x000000000053c401 in MPID_SendComplete ()
#8 0x000000000051f9f7 in PMPI_Waitall ()
#9 0x00000000005207e5 in PMPI_Sendrecv ()
#10 0x000000000051e54b in pmpi_sendrecv_ ()
#11 0x0000000000498d5b in fsum_ ()
#12 0x000000000049894b in fdist_ ()
#13 0x000000000046d36e in force_ ()
#14 0x000000000048ae7c in runmd_ ()
#15 0x000000000043df57 in sander_ ()
#16 0x000000000043a0f3 in MAIN__ ()
#17 0x00000000004064f0 in main ()
gdb) break runmd_
Breakpoint 1 at 0x489f8c
(gdb) break force_
Breakpoint 2 at 0x46c3e4
(gdb) c
[Switching to Thread 46912499318208 (LWP 4030)]
Breakpoint 2, 0x000000000046c3e4 in force_ ()
(gdb) clear force_
Deleted breakpoint 2
(gdb) c
After this the run continues forever, when it is expected to complete in 4-6
Can anyone help me with this.
Thanks in Advance
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Thu Oct 13 2005 - 18:53:00 PDT