[AMBER] If this is normal for 3 or 4 sander.MPI jobs appearing on one node from MUHAMMAD IMTIA SHAFIQ on 2010-02-02 (Amber Archive Feb 2010)

From: MUHAMMAD IMTIA SHAFIQ <imtiazshafiq.gmail.com>
Date: Tue, 2 Feb 2010 19:46:31 +0000

Dear All,

In our cluster we have 36 processors on 9 nodes. I am running my sander.MPI run with mpirun specifying 16 processors.

I have received an email from our network administrator that I am running 3 to 4 sander.MPI jobs on each node and according to him this is not correct and I am preventing others jobs

Here is output of showq
414039 mis9 Running 16 2:03:42:46 Sun Jan 31 19:24:12
1 Active Job 16 of 36 Processors Active (44.44%)
4 of 9 Nodes Active (44.44%)

I have SSH to the cluster and nodes and seen that 3 or 4 sander.MPI jobs are running on different nodes with a CPU usage of about 30% to 45% .

As per my knowledge and understating it seems to be normal as I have specified 16 processors for sander.MPI run so on a single node (having 4 processors) it is excepted to have 4 jobs. So I have no issues as I am getting correct output files according to the tutorial and every thing seems fine to me.

Please guide me if this is normal for 3 or 4 sander.MPI jobs appearing on one node or something wrong ? If there is something wrong please suggest me how to correct it.

Here is a screenshot of TOP command

Tasks: 80 total, 4 running, 76 sleeping, 0 stopped, 0 zombie

Cpu(s): 30.0% us, 7.2% sy, 0.0% ni, 59.5% id, 0.0% wa, 0.1% hi, 3.2% si

Mem: 3990416k total, 1573272k used, 2417144k free, 152048k buffers

Swap: 3911788k total, 0k used, 3911788k free, 375496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6137 mis9 16 0 354m 207m 3880 S 51 5.3 414:57.95 sander.MPI
6139 mis9 16 0 354m 206m 3356 S 37 5.3 413:06.71 sander.MPI
6138 mis9 16 0 354m 206m 3880 R 36 5.3 414:25.87 sander.MPI
6136 mis9 16 0 354m 208m 4972 R 29 5.3 353:33.69 sander.MPI
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Feb 02 2010 - 12:00:03 PST