Thank you David.
I could run 4096wat properly by sander compiled with MPICH compiled with
gcc/g77.
But the speed is slow.
And I could run 4096wat by sander compiled with MPICH compiled with PGI
compiler on ONE CPU.
My Machine has two cpu.So, my MACHINE is SMP Linux.
And, I will try the below program and "export P4_GLOBALMEMSIZE=1000000".
Thank you.
David Konerding wrote:
>
> MPICH on Linux with PGI can be very tricky to set up properly. There are a few things you
> may want to try.
>
> 1) Do the simple control experiment of testing a basic program. AMBER is a demanding
> MPI application, and can break MPICH in some interesting ways. But if
> you can't even run a simple MPI application, then you know to look elsewhere.
> 2) Try it with AMBER and MPICH compiled with gcc/g77 (make sure both are compiled using the same compiler)
> 3) If you're using the shared memory communicator (IE on a dual processor)
> try setting the following in .bashrc:
> export P4_GLOBALMEMSIZE=1000000
>
> I use the following app to test my cluster. It basically just saturates the network
> by doing send/recv between pairs of machines. Run it with an even number of processors, 2 or more.
>
> #include <math.h>
> #include <malloc.h>
> #include <stdio.h>
> #include "mpi.h"
>
> #define SIZE 4096
>
> int main(int argc,char *argv[]) {
> MPI_Status status;
> int me, np;
> int i, j;
> double t0, t1, overhead, dt;
> char *data;
> char hostname[256];
> int num_cycles;
>
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &me );
> MPI_Comm_size( MPI_COMM_WORLD, &np );
>
> if (np % 2 != 0) {
> if (me == 0) {
> printf("Error: # cpus must be multiple of 2!\n");
> }
> exit(1);
> }
>
> if (me % 2 == 0) {
> t0 = MPI_Wtime();
> num_cycles = 0;
> }
>
>
> data = (char *)calloc(sizeof(char) * SIZE, 0);
>
> if (me==0) printf("Starting\n");
> while(1) {
> if (me % 2 == 0) {
> MPI_Send(data,SIZE,MPI_CHAR,me+1,1,MPI_COMM_WORLD);
> }
> else {
> MPI_Recv(data,SIZE,MPI_CHAR,me-1,1,MPI_COMM_WORLD, &status);
> }
>
> if (me % 2 == 0) {
> t1 = MPI_Wtime();
> num_cycles++;
> if (t1-t0 > 5.0) {
> printf("Node %d to %d: throughput %5.2f Mbytes/sec\n", me,me+1,(num_cycles*SIZE)/(t1-t0)/1.e6);
> fflush(stdout);
> num_cycles = 0;
> t0 = MPI_Wtime();
> }
> }
> }
> MPI_Finalize();
> }
Received on Fri May 26 2000 - 08:36:50 PDT