Re: [AMBER] Error in PMEMD run from Marek Malý on 2009-05-08 (Amber Archive May 2009)

From: Marek Malý <maly.sci.ujep.cz>
Date: Fri, 8 May 2009 19:02:53 +0100

Hi Bob,

If I am not wrong I have finally found the right download link for your
recommended version of ifort + MKL.

https://registrationcenter.intel.com/RegCenter/RegisterSNInfo.aspx?sn=N3KT-C5SGT7FD&EmailID=bruno.phas.ubc.ca&Sequence=722173

Am I right ?

Best,

Marek

Dne Fri, 08 May 2009 15:29:48 +0200 Robert Duke <rduke.email.unc.edu>
napsal/-a:

> Okay, what is the interconnect? IF it is gigabit ethernet, you are not
> going to do well with pmemd or even sander. This particular quad core
> configuration (the 5300 series is clovertown, so I believe this is what
> you have) is widely known to not scale very well at all, even with the
> best interconnects, and it is hard to efficiently use all the cpu's on
> one node (so the scaling on the node itself is bad - insufficient memory
> bandwidth basically). One thing folks do is to not use all the cores on
> each node. That way they actually do get better performance and
> scaling. The good news is that the next generation chips (nehalem)
> should be better; I don't yet have my hands on one though. So first
> critical question - do you have a gigabit ethernet interconnect? If so,
> you all have ignored all kinds of advice to not expect this to work well
> on the list. If you do have infiniband, then there are possible
> configuration issues (lots of different bandwidth options there),
> possible issues with the bandwidth on the system bus, etc. There are
> many many ways to not get optimal performance out of this sort of
> hardware, and it has not been getting easier to get it right. Okay, the
> 4 node results. What you want from a system under increasing load is
> called "graceful degradation". This basically means that things never
> get worse in terms of the work throughput as the load increases. But
> unbalanced hardware systems don't do that, and probably what you are
> seeing with pmemd is because it can generate so much more interconnect
> data traffic per core, it is just driving the system to a more
> completely degraded state - ethernet would be really bad about this due
> to the fact it is a CSMA/CD protocol (carrier sense multiple access with
> collision detection; the result of two interconnect nics trying to
> transmit at the same time (a collision) is they both do a random
> exponential "backoff" (wait and try again later); if a collision occurs
> in the next time they wait a random but longer time, and so on - so the
> effect of too many nodes trying to communicate over an ethernet at the
> same time is that the bandwidth drops way down). Anyway, look at the
> amber 10 benchmarks that Ross put out on www. ambermd.org - this is what
> happens, even with really GOOD interconnects, with these darn 8 cpu
> nodes you have. Then look at my amber 9 benchmarks for things like the
> ibm sp5, for the cray xt3 (when it had 1 cpu/node), for topsail before
> it was "upgraded" to quad core 8 cpu/node, etc. We are in the valley of
> the shadow now with the current generation of supercomputers, and should
> be crawling back out hopefully over the next two years as folks realize
> what they have done (you can get near or to a petaflop with these
> machines, but it is a meaningless petaflop - the systems cannot
> communicate adequately, even at the level of the node, to be able to
> solve a real world parallel problem). Anyway, 1) more info about your
> system is required to understand whether you can do any better than this
> and 2) you may be able to get more throughput (effective work per unit
> time) by not using all the cpu's in each node (currently I think the
> best results for kraken, a monster xt4 at ORNL are obtained using only 1
> of the 4 cpu's in each node - what a mess).
> Regards - Bob
>
> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
> To: "AMBER Mailing List" <amber.ambermd.org>
> Sent: Friday, May 08, 2009 8:54 AM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Hi Bob,
>
> I don't know what to say ...
>
> Today I made a test to investigate SANDER/PMEMD scaling on our cluster
> after reinstaling SANDER (using ifort 11) and with old compilation of
> PMEMD (10.1.012)
> which just uses new cc/MKL dynamic libraries.
> I have obtained this results:
>
> SANDER
>
> 1node 197,79 s
> 2nodes 186,47 s
> 4nodes 218 s
>
> PMEMD
>
> 1node 145 s
> 2nodes 85 s
> 4nodes 422 s
>
> 1 node = 2 x Xeon Quad-core 5365 (3,00 GHz ) = 8 CPU cores
>
> Molecular test system has 60 000 atoms (explicit wat) with cut-off 10A,
> 1000 time steps.
>
>
> As you can see regarding to SANDER scaling is bad and is really wasting
> of
> CPUs to use more than 1 node per job.
>
> Regarding to PMEMD situation is different. As you see there are two
> pretty
> strange opposite extreems.
> Extreemly nice result for 2 nodes and extreemly bad for 4 nodes. I
> repeated and check twice ...
>
> All tests was performed using the same 4 empty nodes.
>
> This behaviour is pretty strange but if I use for whole job = SANDER
> (minimisation + heating + density), PMEMD (NPT equil, NVT production)
> 2 nodes I could have pretty fine results at least from the time point of
> view. But there is another issue regarding to reliability
> of computed data, not just here but also later when I will compute
> binding
> energ. MM_PBSA/NAB.
>
> So I do prefer your suggestion to recompile whole Amber (including PMEMD)
> with ifort v. 10.1.021 as you suggested.
>
> Could you please advise me where is possible to download this old version
> (if possible with relevant MKL and cc libraries) ?
>
> Thank you very much in advance !
>
> Best,
>
> Marek
>
>
>
>
>
>
>
>
>
>
> Dne Fri, 08 May 2009 02:11:25 +0200 Robert Duke <rduke.email.unc.edu>
> napsal/-a:
>
>> Marek,
>> There has been a whole other thread running about how ifort 11, various
>> versions, will hang if you try to use it to compile pmemd (actual mails
>> on the reflector right around yours...). I have recommended using
>> ifort 10.1.021 because I know it works fine. As far as ifort 11.*, I
>> have no experience, but there are reports of it hanging (this is a
>> compiler bug - the compiler is defective). I also have coworkers that
>> have tried to build gaussian 03 with ifort 11.*, and it compiles, but
>> the executables don't pass tests. I think German Ecklenberg (I am
>> guessing at the name - I unfortunately cleaned up some mail and the
>> may amber reflector postings are not available yet) did get some
>> version of 11 to work (might have been 11.0.084, but we are dealing
>> with a very dim recollection here), but I would still prefer to just
>> trust 10.1.021... Boy, you are getting to hit all the speed
>> bumps... These days I would not trust any software intel releases for
>> about 6 months after it is released - let other guys do the bleeding
>> on the bleeding edge... Ross concurs with me on this one.
>> Best Regards - Bob
>> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
>> To: "AMBER Mailing List" <amber.ambermd.org>
>> Sent: Thursday, May 07, 2009 7:59 PM
>> Subject: Re: [AMBER] Error in PMEMD run
>>
>>
>> Dear Ross and Bob,
>>
>> first of all thank you very much for your time and effort
>> which really brought good result however some problem
>> is still present ...
>>
>> OK,
>>
>>
>> Our admin installed today ifort version 11 including corresponding cc,
>> MKL.
>>
>> Here is actual LD_LIBRARY_PATH :
>>
>> LD_LIBRARY_PATH=/opt/intel/impi/3.2.0.011/lib64:/opt/intel/mkl/10.1.0.015/lib/em64t:/opt/intel/cc/11.0.074/lib/intel64:/opt/intel/fc/11.0.074/lib/intel64::/opt/intel/impi/3.2/lib64
>>
>> Of course first of all I tried just to compile pmemd with this new
>> settings but I didn't succeeded :((
>>
>> Here is my configuration statement:
>>
>> ./configure linux_em64t ifort intelmpi
>>
>>
>> Compilation started fine but after some time it "stopped" it means just
>> progress stopped but not
>> the compilation process it means after cca 1 hour the process is still
>> alive see this part
>> of the "top" list:
>>
>> 30599 mmaly 20 0 61496 16m 7012 R 50 0.1 58:51.64 fortcom
>>
>>
>> But almost whole hour it was got jammed here:
>>
>> .....
>>
>> runmin.f90(465): (col. 11) remark: LOOP WAS VECTORIZED.
>> runmin.f90(482): (col. 3) remark: LOOP WAS VECTORIZED.
>> runmin.f90(486): (col. 3) remark: LOOP WAS VECTORIZED.
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC veclib.fpp veclib.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 veclib.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> gcc -c pmemd_clib.c
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC gb_alltasks_setup.fpp gb_alltasks_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 gb_alltasks_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pme_alltasks_setup.fpp pme_alltasks_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pme_alltasks_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pme_setup.fpp pme_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pme_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> pme_setup.f90(145): (col. 17) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(159): (col. 22) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC get_cmdline.fpp get_cmdline.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 get_cmdline.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC master_setup.fpp master_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 master_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>> -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pmemd.fpp pmemd.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pmemd.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> <<<<-HERE IS THE LAST LINE OF THE COMPILATION PROCESS
>>
>> after this one hour I killed compilation and obtained this typical
>> messages:
>>
>> make[1]: *** Deleting file `pmemd.o'
>> make[1]: *** [pmemd.o] Error 2
>> make: *** [install] Interrupt
>>
>> I really do not understand how it is possible that compiler is using 50%
>> of CPU for one hour and get jammed in one line ...
>>
>> I have to say that compilation with old version of ifort package was
>> question of some minutes.
>>
>> It seems to me as an typical case of "infinity" loop ...
>>
>> But nevertheless then I got the idea just use the old pmemd compilation
>> with the new installed libraries (cc ...) and it works :)) !!!
>>
>> Another situation was with SANDER but after compleet recompilation of
>> Amber Tools and Amber, everything is OK ( at least now :)) ).
>>
>> So I think that my problem is solved but still is here some strange
>> question about impossiblity to finish in real time compilation
>> of the PMEMD with ifort11 package. Just to be complex I have to say,
>> that
>> I tried pmemd instalation with original "configure" file
>> but also with Bob's late night one. The result is the same. Fortunately
>> it
>> is not crucial problem for me now ...
>>
>> So thank you both again !!!
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dne Thu, 07 May 2009 01:04:10 +0200 Robert Duke <rduke.email.unc.edu>
>> napsal/-a:
>>
>>> Oh, very good find Ross; I have not had the experience of mixing
>>> these, but I bet you are right! - Bob
>>> ----- Original Message ----- From: "Ross Walker"
>>> <ross.rosswalker.co.uk>
>>> To: "'AMBER Mailing List'" <amber.ambermd.org>
>>> Sent: Wednesday, May 06, 2009 5:53 PM
>>> Subject: RE: [AMBER] Error in PMEMD run
>>>
>>>
>>>> Hi Marek,
>>>>
>>>>> here is the content of the LD_LIBRARY_PATH variable:
>>>>>
>>>>> LD_LIBRARY_PATH=/opt/intel/impi/3.1/lib64:/opt/intel/mkl/10.0.011/lib/e
>>>>> m64t:/opt
>>>>> /intel/cce/9.1.043/lib:/opt/intel/fce/10.1.012/lib::/opt/intel/impi/3.1
>>>>> /lib64
>>>>
>>>> I suspect this is the origin of your problems... You have cce v9.1.043
>>>> defined and fce v10.1.012 defined. I bet these are not compatible.
>>>> Note
>>>> there is a libsvml.so in /intel/cce/9.1.043/lib/ and this comes first
>>>> in
>>>> your LD path so will get picked up before the Fortran one. This is
>>>> probably
>>>> leading to all sorts of problems.
>>>>
>>>> My advice would be to remove the old cce library spec from the path
>>>> so it
>>>> picks up the correct libsvml. Or upgrade your cce to match the fce
>>>> compiler
>>>> version - this should probably always be done and I am surprised
>>>> Intel let
>>>> you have mixed versions this way but alas..... <sigh>
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>>
>>>> /\
>>>> \/
>>>> |\oss Walker
>>>>
>>>> | Assistant Research Professor |
>>>> | San Diego Supercomputer Center |
>>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>>
>>>> Note: Electronic Mail is not secure, has no guarantee of delivery,
>>>> may not
>>>> be read every day, and should not be used for urgent or sensitive
>>>> issues.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> __________ Informace od NOD32 4051 (20090504) __________
>>>
>>> Tato zprava byla proverena antivirovym systemem NOD32.
>>> http://www.nod32.cz
>>>
>>>
>>
>

--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed May 20 2009 - 15:13:02 PDT