Hi,
I am trying to run a benchmark called `rt` using Amber for 8 processors,
where it sorts of goes into an infinite loop. However it runs successfully
to completion for less than 8 processors. This is for sander runs.
I used 64-bit Linux 2.6.13-rc7, on an 8 way NUMA machine with
mpich-1.2.7compiled with
gcc-3.2.3 & ifort-8.1.025 on x86_64 architecture.
The `top` output shows all the 8 threads actively running.
The stack trace shows the following :
#0 0x0000003f04aae259 in sched_yield () from /lib64/tls/libc.so.6
#1 0x00007fffff911ba0 in ?? ()
#2 0x000000000054b1a0 in MPID_SHMEM_ReadControl ()
#3 0x000000000054b1a0 in MPID_SHMEM_ReadControl ()
#4 0x0000000000545827 in MPID_SHMEM_Check_incoming ()
#5 0x000000000053cc67 in MPID_DeviceCheck ()
#6 0x000000000054a6c1 in MPID_SHMEM_Rndvn_send_wait_ack ()
#7 0x000000000053c401 in MPID_SendComplete ()
#8 0x000000000051f9f7 in PMPI_Waitall ()
#9 0x00000000005207e5 in PMPI_Sendrecv ()
#10 0x000000000051e54b in pmpi_sendrecv_ ()
#11 0x0000000000498d5b in fsum_ ()
#12 0x000000000049894b in fdist_ ()
#13 0x000000000046d36e in force_ ()
#14 0x000000000048ae7c in runmd_ ()
#15 0x000000000043df57 in sander_ ()
#16 0x000000000043a0f3 in MAIN__ ()
#17 0x00000000004064f0 in main ()
gdb) break runmd_
Breakpoint 1 at 0x489f8c
(gdb) break force_
Breakpoint 2 at 0x46c3e4
(gdb) c
Continuing.
[Switching to Thread 46912499318208 (LWP 4030)]
Breakpoint 2, 0x000000000046c3e4 in force_ ()
(gdb) clear force_
Deleted breakpoint 2
(gdb) c
Continuing.
After this the run continues forever, when it is expected to complete in 4-6
mins.
Can anyone help me with this.
Thanks in Advance
Imran
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Thu Oct 13 2005 - 18:53:00 PDT