Re: [AMBER] Error in PMEMD run

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 8 May 2009 19:06:17 +0100

Hi Marek,
That looks right when I click on it to download. I am sorting through what
intel has done recently to screw up being able to get whatever release you
want, though. In the meantime you may want to just pick up this one, to be
sure. As to whether the 11 build you have really works, well, you would
have to run a bunch of the tests to be sure; I have no other way of knowing.
I will try to answer some of your other questions shortly.
- Bob
----- Original Message -----
From: "Marek Malý" <maly.sci.ujep.cz>
To: "AMBER Mailing List" <amber.ambermd.org>
Sent: Friday, May 08, 2009 2:02 PM
Subject: Re: [AMBER] Error in PMEMD run


Hi Bob,

If I am not wrong I have finally found the right download link for your
recommended version of ifort + MKL.

https://registrationcenter.intel.com/RegCenter/RegisterSNInfo.aspx?sn=N3KT-C5SGT7FD&EmailID=bruno.phas.ubc.ca&Sequence=722173

Am I right ?

   Best,

       Marek


Dne Fri, 08 May 2009 15:29:48 +0200 Robert Duke <rduke.email.unc.edu>
napsal/-a:

> Okay, what is the interconnect? IF it is gigabit ethernet, you are not
> going to do well with pmemd or even sander. This particular quad core
> configuration (the 5300 series is clovertown, so I believe this is what
> you have) is widely known to not scale very well at all, even with the
> best interconnects, and it is hard to efficiently use all the cpu's on
> one node (so the scaling on the node itself is bad - insufficient memory
> bandwidth basically). One thing folks do is to not use all the cores on
> each node. That way they actually do get better performance and scaling.
> The good news is that the next generation chips (nehalem) should be
> better; I don't yet have my hands on one though. So first critical
> question - do you have a gigabit ethernet interconnect? If so, you all
> have ignored all kinds of advice to not expect this to work well on the
> list. If you do have infiniband, then there are possible configuration
> issues (lots of different bandwidth options there), possible issues with
> the bandwidth on the system bus, etc. There are many many ways to not
> get optimal performance out of this sort of hardware, and it has not been
> getting easier to get it right. Okay, the 4 node results. What you want
> from a system under increasing load is called "graceful degradation".
> This basically means that things never get worse in terms of the work
> throughput as the load increases. But unbalanced hardware systems don't
> do that, and probably what you are seeing with pmemd is because it can
> generate so much more interconnect data traffic per core, it is just
> driving the system to a more completely degraded state - ethernet would
> be really bad about this due to the fact it is a CSMA/CD protocol
> (carrier sense multiple access with collision detection; the result of
> two interconnect nics trying to transmit at the same time (a collision)
> is they both do a random exponential "backoff" (wait and try again
> later); if a collision occurs in the next time they wait a random but
> longer time, and so on - so the effect of too many nodes trying to
> communicate over an ethernet at the same time is that the bandwidth drops
> way down). Anyway, look at the amber 10 benchmarks that Ross put out on
> www. ambermd.org - this is what happens, even with really GOOD
> interconnects, with these darn 8 cpu nodes you have. Then look at my
> amber 9 benchmarks for things like the ibm sp5, for the cray xt3 (when it
> had 1 cpu/node), for topsail before it was "upgraded" to quad core 8
> cpu/node, etc. We are in the valley of the shadow now with the current
> generation of supercomputers, and should be crawling back out hopefully
> over the next two years as folks realize what they have done (you can get
> near or to a petaflop with these machines, but it is a meaningless
> petaflop - the systems cannot communicate adequately, even at the level
> of the node, to be able to solve a real world parallel problem). Anyway,
> 1) more info about your system is required to understand whether you can
> do any better than this and 2) you may be able to get more throughput
> (effective work per unit time) by not using all the cpu's in each node
> (currently I think the best results for kraken, a monster xt4 at ORNL are
> obtained using only 1 of the 4 cpu's in each node - what a mess).
> Regards - Bob
>
> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
> To: "AMBER Mailing List" <amber.ambermd.org>
> Sent: Friday, May 08, 2009 8:54 AM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Hi Bob,
>
> I don't know what to say ...
>
> Today I made a test to investigate SANDER/PMEMD scaling on our cluster
> after reinstaling SANDER (using ifort 11) and with old compilation of
> PMEMD (10.1.012)
> which just uses new cc/MKL dynamic libraries.
> I have obtained this results:
>
> SANDER
>
> 1node 197,79 s
> 2nodes 186,47 s
> 4nodes 218 s
>
> PMEMD
>
> 1node 145 s
> 2nodes 85 s
> 4nodes 422 s
>
> 1 node = 2 x Xeon Quad-core 5365 (3,00 GHz ) = 8 CPU cores
>
> Molecular test system has 60 000 atoms (explicit wat) with cut-off 10A,
> 1000 time steps.
>
>
> As you can see regarding to SANDER scaling is bad and is really wasting
> of
> CPUs to use more than 1 node per job.
>
> Regarding to PMEMD situation is different. As you see there are two
> pretty
> strange opposite extreems.
> Extreemly nice result for 2 nodes and extreemly bad for 4 nodes. I
> repeated and check twice ...
>
> All tests was performed using the same 4 empty nodes.
>
> This behaviour is pretty strange but if I use for whole job = SANDER
> (minimisation + heating + density), PMEMD (NPT equil, NVT production)
> 2 nodes I could have pretty fine results at least from the time point of
> view. But there is another issue regarding to reliability
> of computed data, not just here but also later when I will compute
> binding
> energ. MM_PBSA/NAB.
>
> So I do prefer your suggestion to recompile whole Amber (including PMEMD)
> with ifort v. 10.1.021 as you suggested.
>
> Could you please advise me where is possible to download this old version
> (if possible with relevant MKL and cc libraries) ?
>
> Thank you very much in advance !
>
> Best,
>
> Marek
>
>
>
>
>
>
>
>
>
>
> Dne Fri, 08 May 2009 02:11:25 +0200 Robert Duke <rduke.email.unc.edu>
> napsal/-a:
>
>> Marek,
>> There has been a whole other thread running about how ifort 11, various
>> versions, will hang if you try to use it to compile pmemd (actual mails
>> on the reflector right around yours...). I have recommended using ifort
>> 10.1.021 because I know it works fine. As far as ifort 11.*, I have no
>> experience, but there are reports of it hanging (this is a compiler
>> ug - the compiler is defective). I also have coworkers that have tried
>> to build gaussian 03 with ifort 11.*, and it compiles, but the
>> executables don't pass tests. I think German Ecklenberg (I am guessing
>> at the name - I unfortunately cleaned up some mail and the may amber
>> reflector postings are not available yet) did get some version of 11 to
>> work (might have been 11.0.084, but we are dealing with a very dim
>> recollection here), but I would still prefer to just trust 10.1.021...
>> Boy, you are getting to hit all the speed bumps... These days I would
>> not trust any software intel releases for about 6 months after it is
>> released - let other guys do the bleeding on the bleeding edge... Ross
>> concurs with me on this one.
>> Best Regards - Bob
>> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
>> To: "AMBER Mailing List" <amber.ambermd.org>
>> Sent: Thursday, May 07, 2009 7:59 PM
>> Subject: Re: [AMBER] Error in PMEMD run
>>
>>
>> Dear Ross and Bob,
>>
>> first of all thank you very much for your time and effort
>> which really brought good result however some problem
>> is still present ...
>>
>> OK,
>>
>>
>> Our admin installed today ifort version 11 including corresponding cc,
>> MKL.
>>
>> Here is actual LD_LIBRARY_PATH :
>>
>> LD_LIBRARY_PATH=/opt/intel/impi/3.2.0.011/lib64:/opt/intel/mkl/10.1.0.015/lib/em64t:/opt/intel/cc/11.0.074/lib/intel64:/opt/intel/fc/11.0.074/lib/intel64::/opt/intel/impi/3.2/lib64
>>
>> Of course first of all I tried just to compile pmemd with this new
>> settings but I didn't succeeded :((
>>
>> Here is my configuration statement:
>>
>> ./configure linux_em64t ifort intelmpi
>>
>>
>> Compilation started fine but after some time it "stopped" it means just
>> progress stopped but not
>> the compilation process it means after cca 1 hour the process is still
>> alive see this part
>> of the "top" list:
>>
>> 30599 mmaly 20 0 61496 16m 7012 R 50 0.1 58:51.64 fortcom
>>
>>
>> But almost whole hour it was got jammed here:
>>
>> .....
>>
>> runmin.f90(465): (col. 11) remark: LOOP WAS VECTORIZED.
>> runmin.f90(482): (col. 3) remark: LOOP WAS VECTORIZED.
>> runmin.f90(486): (col. 3) remark: LOOP WAS VECTORIZED.
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC veclib.fpp veclib.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 veclib.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> gcc -c pmemd_clib.c
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC gb_alltasks_setup.fpp gb_alltasks_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 gb_alltasks_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pme_alltasks_setup.fpp pme_alltasks_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pme_alltasks_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pme_setup.fpp pme_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pme_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> pme_setup.f90(145): (col. 17) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(159): (col. 22) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC get_cmdline.fpp get_cmdline.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 get_cmdline.f90
>> ifort: command line remark #10148: option '-tp' not supported
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC master_setup.fpp master_setup.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 master_setup.f90
>> ifort: command line remark #10148: option '-tp' not supported
>>
>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT -DBINTRAJ
>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>> -DFFTLOADBAL_2PROC pmemd.fpp pmemd.f90
>> ifort -c -auto -tpp7 -xP -ip -O3 pmemd.f90
>> ifort: command line remark #10148: option '-tp' not supported
>> <<<<-HERE IS THE LAST LINE OF THE COMPILATION PROCESS
>>
>> after this one hour I killed compilation and obtained this typical
>> messages:
>>
>> make[1]: *** Deleting file `pmemd.o'
>> make[1]: *** [pmemd.o] Error 2
>> make: *** [install] Interrupt
>>
>> I really do not understand how it is possible that compiler is using 50%
>> of CPU for one hour and get jammed in one line ...
>>
>> I have to say that compilation with old version of ifort package was
>> question of some minutes.
>>
>> It seems to me as an typical case of "infinity" loop ...
>>
>> But nevertheless then I got the idea just use the old pmemd compilation
>> with the new installed libraries (cc ...) and it works :)) !!!
>>
>> Another situation was with SANDER but after compleet recompilation of
>> Amber Tools and Amber, everything is OK ( at least now :)) ).
>>
>> So I think that my problem is solved but still is here some strange
>> question about impossiblity to finish in real time compilation
>> of the PMEMD with ifort11 package. Just to be complex I have to say,
>> that
>> I tried pmemd instalation with original "configure" file
>> but also with Bob's late night one. The result is the same. Fortunately
>> it
>> is not crucial problem for me now ...
>>
>> So thank you both again !!!
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dne Thu, 07 May 2009 01:04:10 +0200 Robert Duke <rduke.email.unc.edu>
>> napsal/-a:
>>
>>> Oh, very good find Ross; I have not had the experience of mixing these,
>>> but I bet you are right! - Bob
>>> ----- Original Message ----- From: "Ross Walker"
>>> <ross.rosswalker.co.uk>
>>> To: "'AMBER Mailing List'" <amber.ambermd.org>
>>> Sent: Wednesday, May 06, 2009 5:53 PM
>>> Subject: RE: [AMBER] Error in PMEMD run
>>>
>>>
>>>> Hi Marek,
>>>>
>>>>> here is the content of the LD_LIBRARY_PATH variable:
>>>>>
>>>>> LD_LIBRARY_PATH=/opt/intel/impi/3.1/lib64:/opt/intel/mkl/10.0.011/lib/e
>>>>> m64t:/opt
>>>>> /intel/cce/9.1.043/lib:/opt/intel/fce/10.1.012/lib::/opt/intel/impi/3.1
>>>>> /lib64
>>>>
>>>> I suspect this is the origin of your problems... You have cce v9.1.043
>>>> defined and fce v10.1.012 defined. I bet these are not compatible.
>>>> Note
>>>> there is a libsvml.so in /intel/cce/9.1.043/lib/ and this comes first
>>>> in
>>>> your LD path so will get picked up before the Fortran one. This is
>>>> probably
>>>> leading to all sorts of problems.
>>>>
>>>> My advice would be to remove the old cce library spec from the path so
>>>> it
>>>> picks up the correct libsvml. Or upgrade your cce to match the fce
>>>> compiler
>>>> version - this should probably always be done and I am surprised Intel
>>>> let
>>>> you have mixed versions this way but alas..... <sigh>
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>>
>>>> /\
>>>> \/
>>>> |\oss Walker
>>>>
>>>> | Assistant Research Professor |
>>>> | San Diego Supercomputer Center |
>>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>>
>>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>>> not
>>>> be read every day, and should not be used for urgent or sensitive
>>>> issues.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> __________ Informace od NOD32 4051 (20090504) __________
>>>
>>> Tato zprava byla proverena antivirovym systemem NOD32.
>>> http://www.nod32.cz
>>>
>>>
>>
>

--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 20 2009 - 15:13:07 PDT
Custom Search