Re: [AMBER] Error in PMEMD run

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 8 May 2009 14:29:48 +0100

Okay, what is the interconnect? IF it is gigabit ethernet, you are not
going to do well with pmemd or even sander. This particular quad core
configuration (the 5300 series is clovertown, so I believe this is what you
have) is widely known to not scale very well at all, even with the best
interconnects, and it is hard to efficiently use all the cpu's on one node
(so the scaling on the node itself is bad - insufficient memory bandwidth
basically). One thing folks do is to not use all the cores on each node.
That way they actually do get better performance and scaling. The good news
is that the next generation chips (nehalem) should be better; I don't yet
have my hands on one though. So first critical question - do you have a
gigabit ethernet interconnect? If so, you all have ignored all kinds of
advice to not expect this to work well on the list. If you do have
infiniband, then there are possible configuration issues (lots of different
bandwidth options there), possible issues with the bandwidth on the system
bus, etc. There are many many ways to not get optimal performance out of
this sort of hardware, and it has not been getting easier to get it right.
Okay, the 4 node results. What you want from a system under increasing load
is called "graceful degradation". This basically means that things never
get worse in terms of the work throughput as the load increases. But
unbalanced hardware systems don't do that, and probably what you are seeing
with pmemd is because it can generate so much more interconnect data traffic
per core, it is just driving the system to a more completely degraded
state - ethernet would be really bad about this due to the fact it is a
CSMA/CD protocol (carrier sense multiple access with collision detection;
the result of two interconnect nics trying to transmit at the same time (a
collision) is they both do a random exponential "backoff" (wait and try
again later); if a collision occurs in the next time they wait a random but
longer time, and so on - so the effect of too many nodes trying to
communicate over an ethernet at the same time is that the bandwidth drops
way down). Anyway, look at the amber 10 benchmarks that Ross put out on
www. ambermd.org - this is what happens, even with really GOOD
interconnects, with these darn 8 cpu nodes you have. Then look at my amber
9 benchmarks for things like the ibm sp5, for the cray xt3 (when it had 1
cpu/node), for topsail before it was "upgraded" to quad core 8 cpu/node,
etc. We are in the valley of the shadow now with the current generation of
supercomputers, and should be crawling back out hopefully over the next two
years as folks realize what they have done (you can get near or to a
petaflop with these machines, but it is a meaningless petaflop - the systems
cannot communicate adequately, even at the level of the node, to be able to
solve a real world parallel problem). Anyway, 1) more info about your
system is required to understand whether you can do any better than this and
2) you may be able to get more throughput (effective work per unit time) by
not using all the cpu's in each node (currently I think the best results for
kraken, a monster xt4 at ORNL are obtained using only 1 of the 4 cpu's in
each node - what a mess).
Regards - Bob

----- Original Message -----
From: "Marek Malý" <maly.sci.ujep.cz>
To: "AMBER Mailing List" <amber.ambermd.org>
Sent: Friday, May 08, 2009 8:54 AM
Subject: Re: [AMBER] Error in PMEMD run


Hi Bob,

I don't know what to say ...

Today I made a test to investigate SANDER/PMEMD scaling on our cluster
after reinstaling SANDER (using ifort 11) and with old compilation of
PMEMD (10.1.012)
which just uses new cc/MKL dynamic libraries.
I have obtained this results:

SANDER

1node 197,79 s
2nodes 186,47 s
4nodes 218 s

PMEMD

1node 145 s
2nodes 85 s
4nodes 422 s

1 node = 2 x Xeon Quad-core 5365 (3,00 GHz ) = 8 CPU cores

Molecular test system has 60 000 atoms (explicit wat) with cut-off 10A,
1000 time steps.


As you can see regarding to SANDER scaling is bad and is really wasting of
CPUs to use more than 1 node per job.

Regarding to PMEMD situation is different. As you see there are two pretty
strange opposite extreems.
Extreemly nice result for 2 nodes and extreemly bad for 4 nodes. I
repeated and check twice ...

All tests was performed using the same 4 empty nodes.

This behaviour is pretty strange but if I use for whole job = SANDER
(minimisation + heating + density), PMEMD (NPT equil, NVT production)
2 nodes I could have pretty fine results at least from the time point of
view. But there is another issue regarding to reliability
of computed data, not just here but also later when I will compute binding
energ. MM_PBSA/NAB.

So I do prefer your suggestion to recompile whole Amber (including PMEMD)
with ifort v. 10.1.021 as you suggested.

Could you please advise me where is possible to download this old version
(if possible with relevant MKL and cc libraries) ?

Thank you very much in advance !

   Best,

     Marek










Dne Fri, 08 May 2009 02:11:25 +0200 Robert Duke <rduke.email.unc.edu>
napsal/-a:

> Marek,
> There has been a whole other thread running about how ifort 11, various
> versions, will hang if you try to use it to compile pmemd (actual mails
> on the reflector right around yours...). I have recommended using ifort
> 10.1.021 because I know it works fine. As far as ifort 11.*, I have no
> experience, but there are reports of it hanging (this is a compiler bug -
> the compiler is defective). I also have coworkers that have tried to
> build gaussian 03 with ifort 11.*, and it compiles, but the executables
> don't pass tests. I think German Ecklenberg (I am guessing at the name -
> I unfortunately cleaned up some mail and the may amber reflector postings
> are not available yet) did get some version of 11 to work (might have
> been 11.0.084, but we are dealing with a very dim recollection here), but
> I would still prefer to just trust 10.1.021... Boy, you are getting to
> hit all the speed bumps... These days I would not trust any software
> intel releases for about 6 months after it is released - let other guys
> do the bleeding on the bleeding edge... Ross concurs with me on this
> one.
> Best Regards - Bob
> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
> To: "AMBER Mailing List" <amber.ambermd.org>
> Sent: Thursday, May 07, 2009 7:59 PM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Dear Ross and Bob,
>
> first of all thank you very much for your time and effort
> which really brought good result however some problem
> is still present ...
>
> OK,
>
>
> Our admin installed today ifort version 11 including corresponding cc,
> MKL.
>
> Here is actual LD_LIBRARY_PATH :
>
> LD_LIBRARY_PATH=/opt/intel/impi/3.2.0.011/lib64:/opt/intel/mkl/10.1.0.015/lib/em64t:/opt/intel/cc/11.0.074/lib/intel64:/opt/intel/fc/11.0.074/lib/intel64::/opt/intel/impi/3.2/lib64
>
> Of course first of all I tried just to compile pmemd with this new
> settings but I didn't succeeded :((
>
> Here is my configuration statement:
>
> ./configure linux_em64t ifort intelmpi
>
>
> Compilation started fine but after some time it "stopped" it means just
> progress stopped but not
> the compilation process it means after cca 1 hour the process is still
> alive see this part
> of the "top" list:
>
> 30599 mmaly 20 0 61496 16m 7012 R 50 0.1 58:51.64 fortcom
>
>
> But almost whole hour it was got jammed here:
>
> .....
>
> runmin.f90(465): (col. 11) remark: LOOP WAS VECTORIZED.
> runmin.f90(482): (col. 3) remark: LOOP WAS VECTORIZED.
> runmin.f90(486): (col. 3) remark: LOOP WAS VECTORIZED.
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC veclib.fpp veclib.f90
> ifort -c -auto -tpp7 -xP -ip -O3 veclib.f90
> ifort: command line remark #10148: option '-tp' not supported
> gcc -c pmemd_clib.c
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC gb_alltasks_setup.fpp gb_alltasks_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 gb_alltasks_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pme_alltasks_setup.fpp pme_alltasks_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pme_alltasks_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pme_setup.fpp pme_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pme_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
> pme_setup.f90(145): (col. 17) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(159): (col. 22) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC get_cmdline.fpp get_cmdline.f90
> ifort -c -auto -tpp7 -xP -ip -O3 get_cmdline.f90
> ifort: command line remark #10148: option '-tp' not supported
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC master_setup.fpp master_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 master_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
>
> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pmemd.fpp pmemd.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pmemd.f90
> ifort: command line remark #10148: option '-tp' not supported
> <<<<-HERE IS THE LAST LINE OF THE COMPILATION PROCESS
>
> after this one hour I killed compilation and obtained this typical
> messages:
>
> make[1]: *** Deleting file `pmemd.o'
> make[1]: *** [pmemd.o] Error 2
> make: *** [install] Interrupt
>
> I really do not understand how it is possible that compiler is using 50%
> of CPU for one hour and get jammed in one line ...
>
> I have to say that compilation with old version of ifort package was
> question of some minutes.
>
> It seems to me as an typical case of "infinity" loop ...
>
> But nevertheless then I got the idea just use the old pmemd compilation
> with the new installed libraries (cc ...) and it works :)) !!!
>
> Another situation was with SANDER but after compleet recompilation of
> Amber Tools and Amber, everything is OK ( at least now :)) ).
>
> So I think that my problem is solved but still is here some strange
> question about impossiblity to finish in real time compilation
> of the PMEMD with ifort11 package. Just to be complex I have to say, that
> I tried pmemd instalation with original "configure" file
> but also with Bob's late night one. The result is the same. Fortunately
> it
> is not crucial problem for me now ...
>
> So thank you both again !!!
>
> Best,
>
> Marek
>
>
>
>
>
>
>
>
>
>
>
>
> Dne Thu, 07 May 2009 01:04:10 +0200 Robert Duke <rduke.email.unc.edu>
> napsal/-a:
>
>> Oh, very good find Ross; I have not had the experience of mixing these,
>> but I bet you are right! - Bob
>> ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
>> To: "'AMBER Mailing List'" <amber.ambermd.org>
>> Sent: Wednesday, May 06, 2009 5:53 PM
>> Subject: RE: [AMBER] Error in PMEMD run
>>
>>
>>> Hi Marek,
>>>
>>>> here is the content of the LD_LIBRARY_PATH variable:
>>>>
>>>> LD_LIBRARY_PATH=/opt/intel/impi/3.1/lib64:/opt/intel/mkl/10.0.011/lib/e
>>>> m64t:/opt
>>>> /intel/cce/9.1.043/lib:/opt/intel/fce/10.1.012/lib::/opt/intel/impi/3.1
>>>> /lib64
>>>
>>> I suspect this is the origin of your problems... You have cce v9.1.043
>>> defined and fce v10.1.012 defined. I bet these are not compatible. Note
>>> there is a libsvml.so in /intel/cce/9.1.043/lib/ and this comes first
>>> in
>>> your LD path so will get picked up before the Fortran one. This is
>>> probably
>>> leading to all sorts of problems.
>>>
>>> My advice would be to remove the old cce library spec from the path so
>>> it
>>> picks up the correct libsvml. Or upgrade your cce to match the fce
>>> compiler
>>> version - this should probably always be done and I am surprised Intel
>>> let
>>> you have mixed versions this way but alas..... <sigh>
>>>
>>> All the best
>>> Ross
>>>
>>>
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> | Assistant Research Professor |
>>> | San Diego Supercomputer Center |
>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>> not
>>> be read every day, and should not be used for urgent or sensitive
>>> issues.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> __________ Informace od NOD32 4051 (20090504) __________
>>
>> Tato zprava byla proverena antivirovym systemem NOD32.
>> http://www.nod32.cz
>>
>>
>

--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 20 2009 - 15:11:06 PDT
Custom Search