Re: Utilizing Amber6 on Linux Cluster

From: David Konerding <dek_at_cgl.ucsf.edu>
Date: Wed 22 Aug 2001 12:41:13 -0700

Jung-Hsing Lin writes:
>
>Hi! Margaret,
>
>
>the benchmarks have been done for NAMD 2.2b3 and sander 6, from 2 to 256
>processors. You can peek the result here:
>
>http://www.ks.uiuc.edu/Research/namd/performance.html

I got very different results from this running on an Intel/Linux cluster
with ethernet-- however, I ran my tests on my *own* system rather than the
DHFR benchmark. The system is a 12mer duplex of DNA in a box of
water with 12,000 atoms total. I used PME and a direct cutoff of 8A,
the MPICH 1.2.1 library, and gcc/g++/g77 2.95.3.


Times in seconds for AMBER
OUTPUT/OKAstart_rna.out.1:| Nonsetup 258.40 99.14%
OUTPUT/OKAstart_rna.out.2:| Nonsetup 143.62 97.34%
OUTPUT/OKAstart_rna.out.3:| Nonsetup 107.50 96.49%
OUTPUT/OKAstart_rna.out.4:| Nonsetup 90.48 95.86%
OUTPUT/OKAstart_rna.out.5:| Nonsetup 79.57 95.33%
OUTPUT/OKAstart_rna.out.6:| Nonsetup 71.34 94.82%
OUTPUT/OKAstart_rna.out.7:| Nonsetup 67.21 94.50%
OUTPUT/OKAstart_rna.out.8:| Nonsetup 63.77 94.22%
OUTPUT/OKAstart_rna.out.9:| Nonsetup 60.91 93.92%
OUTPUT/OKAstart_rna.out.10:| Nonsetup 57.62 93.66%

Times in seconds for NAMD
1. WallClock: 182.003571 CPUTime: 180.940002 Memory: 26023 kB
2. WallClock: 119.089050 CPUTime: 115.169998 Memory: 22007 kB
3. WallClock: 91.004341 CPUTime: 89.889999 Memory: 16319 kB
4. WallClock: 84.837715 CPUTime: 75.639999 Memory: 15567 kB
5. WallClock: 69.677429 CPUTime: 66.760002 Memory: 13823 kB
6. WallClock: 65.209862 CPUTime: 63.450001 Memory: 12711 kB
7. WallClock: 67.956848 CPUTime: 66.510002 Memory: 12111 kB
8. WallClock: 68.002617 CPUTime: 66.059998 Memory: 12015 kB
9. WallClock: 60.205994 CPUTime: 56.130001 Memory: 11071 kB
10. WallClock: 57.304428 CPUTime: 39.380001 Memory: 9415 kB

As you can see, I get somewhat different results:
1) 1 cpu is faster for NAMD than AMBER
2) at greater than 1 cpu, the results rapidly converge to the same values.

I made a few guesses:

1) the bottleneck at high # of cpus was the interconnect (ethernet)
2) for gcc/g++/g77, the g++ compiler produces more optimized code
than g77
(on the SGI, I would expect that the fortran compiler is highly tuned
to produce optimal performance, and they also have very highly tuned
math libraries)

Note: I would expect that NAMD would do better on longer runs with high #
of processors (and I have seen this) because the load balancer takes a while
to converge...
Received on Wed Aug 22 2001 - 12:41:13 PDT
Custom Search