Re: [AMBER] Error in PMEMD run

From: Marek Malý <maly.sci.ujep.cz>
Date: Fri, 8 May 2009 19:27:20 +0100

Hi Bob,

thanks for the link but that "mine" seems to me more direct (I have no
problems with redirections).

Anyway regarding to reliability of my last Amber instalation I am gona to
do all the tests, I also started
comparison calculation of the systems which I calculated some time ago on
different clusters (with Amber 9).

Thanks for this moment !

    Best,

      Marek


Dne Fri, 08 May 2009 20:14:12 +0200 Robert Duke <rduke.email.unc.edu>
napsal/-a:

> Hi Marek,
> Best approach, to me, seems to be go to:
> https://registrationcenter.intel.com/RegCenter/FileSearch.aspx
> Here I get rerouted due to a cookie, but there has got to be a way you
> have a login dialog to get to the list of your own products. Then if
> you select one of the products (it will have a limited number of
> versions shown), you will get another screen that will have in the lower
> half the ability to select a wide variety of different versions (all the
> way back to 8 for ifort I believe). This was not that difficult in the
> past; more "improvements" from intel.
> Regards - Bob
>
> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
> To: "AMBER Mailing List" <amber.ambermd.org>
> Sent: Friday, May 08, 2009 2:02 PM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Hi Bob,
>
> If I am not wrong I have finally found the right download link for your
> recommended version of ifort + MKL.
>
> https://registrationcenter.intel.com/RegCenter/RegisterSNInfo.aspx?sn=N3KT-C5SGT7FD&EmailID=bruno.phas.ubc.ca&Sequence=722173
>
> Am I right ?
>
> Best,
>
> Marek
>
>
> Dne Fri, 08 May 2009 15:29:48 +0200 Robert Duke <rduke.email.unc.edu>
> napsal/-a:
>
>> Okay, what is the interconnect? IF it is gigabit ethernet, you are not
>> going to do well with pmemd or even sander. This particular quad core
>> configuration (the 5300 series is clovertown, so I believe this is what
>> you have) is widely known to not scale very well at all, even with the
>> best interconnects, and it is hard to efficiently use all the cpu's on
>> one node (so the scaling on the node itself is bad - insufficient
>> memory bandwidth basically). One thing folks do is to not use all the
>> cores on each node. That way they actually do get better performance
>> and scaling. The good news is that the next generation chips
>> (nehalem) should be better; I don't yet have my hands on one though.
>> So first critical question - do you have a gigabit ethernet
>> interconnect? If so, you all have ignored all kinds of advice to not
>> expect this to work well on the list. If you do have infiniband, then
>> there are possible configuration issues (lots of different bandwidth
>> options there), possible issues with the bandwidth on the system bus,
>> etc. There are many many ways to not get optimal performance out of
>> this sort of hardware, and it has not been getting easier to get it
>> right. Okay, the 4 node results. What you want from a system under
>> increasing load is called "graceful degradation". This basically means
>> that things never get worse in terms of the work throughput as the
>> load increases. But unbalanced hardware systems don't do that, and
>> probably what you are seeing with pmemd is because it can generate so
>> much more interconnect data traffic per core, it is just driving the
>> system to a more completely degraded state - ethernet would be really
>> bad about this due to the fact it is a CSMA/CD protocol (carrier sense
>> multiple access with collision detection; the result of two
>> interconnect nics trying to transmit at the same time (a collision) is
>> they both do a random exponential "backoff" (wait and try again
>> later); if a collision occurs in the next time they wait a random but
>> longer time, and so on - so the effect of too many nodes trying to
>> communicate over an ethernet at the same time is that the bandwidth
>> drops way down). Anyway, look at the amber 10 benchmarks that Ross
>> put out on www. ambermd.org - this is what happens, even with really
>> GOOD interconnects, with these darn 8 cpu nodes you have. Then look
>> at my amber 9 benchmarks for things like the ibm sp5, for the cray xt3
>> (when it had 1 cpu/node), for topsail before it was "upgraded" to quad
>> core 8 cpu/node, etc. We are in the valley of the shadow now with the
>> current generation of supercomputers, and should be crawling back out
>> hopefully over the next two years as folks realize what they have done
>> (you can get near or to a petaflop with these machines, but it is a
>> meaningless petaflop - the systems cannot communicate adequately, even
>> at the level of the node, to be able to solve a real world parallel
>> problem). Anyway, 1) more info about your system is required to
>> understand whether you can do any better than this and 2) you may be
>> able to get more throughput (effective work per unit time) by not
>> using all the cpu's in each node (currently I think the best results
>> for kraken, a monster xt4 at ORNL are obtained using only 1 of the 4
>> cpu's in each node - what a mess).
>> Regards - Bob
>>
>> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
>> To: "AMBER Mailing List" <amber.ambermd.org>
>> Sent: Friday, May 08, 2009 8:54 AM
>> Subject: Re: [AMBER] Error in PMEMD run
>>
>>
>> Hi Bob,
>>
>> I don't know what to say ...
>>
>> Today I made a test to investigate SANDER/PMEMD scaling on our cluster
>> after reinstaling SANDER (using ifort 11) and with old compilation of
>> PMEMD (10.1.012)
>> which just uses new cc/MKL dynamic libraries.
>> I have obtained this results:
>>
>> SANDER
>>
>> 1node 197,79 s
>> 2nodes 186,47 s
>> 4nodes 218 s
>>
>> PMEMD
>>
>> 1node 145 s
>> 2nodes 85 s
>> 4nodes 422 s
>>
>> 1 node = 2 x Xeon Quad-core 5365 (3,00 GHz ) = 8 CPU cores
>>
>> Molecular test system has 60 000 atoms (explicit wat) with cut-off 10A,
>> 1000 time steps.
>>
>>
>> As you can see regarding to SANDER scaling is bad and is really wasting
>> of
>> CPUs to use more than 1 node per job.
>>
>> Regarding to PMEMD situation is different. As you see there are two
>> pretty
>> strange opposite extreems.
>> Extreemly nice result for 2 nodes and extreemly bad for 4 nodes. I
>> repeated and check twice ...
>>
>> All tests was performed using the same 4 empty nodes.
>>
>> This behaviour is pretty strange but if I use for whole job = SANDER
>> (minimisation + heating + density), PMEMD (NPT equil, NVT production)
>> 2 nodes I could have pretty fine results at least from the time point of
>> view. But there is another issue regarding to reliability
>> of computed data, not just here but also later when I will compute
>> binding
>> energ. MM_PBSA/NAB.
>>
>> So I do prefer your suggestion to recompile whole Amber (including
>> PMEMD)
>> with ifort v. 10.1.021 as you suggested.
>>
>> Could you please advise me where is possible to download this old
>> version
>> (if possible with relevant MKL and cc libraries) ?
>>
>> Thank you very much in advance !
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dne Fri, 08 May 2009 02:11:25 +0200 Robert Duke <rduke.email.unc.edu>
>> napsal/-a:
>>
>>> Marek,
>>> There has been a whole other thread running about how ifort 11,
>>> various versions, will hang if you try to use it to compile pmemd
>>> (actual mails on the reflector right around yours...). I have
>>> recommended using ifort 10.1.021 because I know it works fine. As
>>> far as ifort 11.*, I have no experience, but there are reports of it
>>> hanging (this is a compiler ug - the compiler is defective). I also
>>> have coworkers that have tried to build gaussian 03 with ifort 11.*,
>>> and it compiles, but the executables don't pass tests. I think
>>> German Ecklenberg (I am guessing at the name - I unfortunately
>>> cleaned up some mail and the may amber reflector postings are not
>>> available yet) did get some version of 11 to work (might have been
>>> 11.0.084, but we are dealing with a very dim recollection here), but
>>> I would still prefer to just trust 10.1.021... Boy, you are getting
>>> to hit all the speed bumps... These days I would not trust any
>>> software intel releases for about 6 months after it is released - let
>>> other guys do the bleeding on the bleeding edge... Ross concurs with
>>> me on this one.
>>> Best Regards - Bob
>>> ----- Original Message ----- From: "Marek Malý" <maly.sci.ujep.cz>
>>> To: "AMBER Mailing List" <amber.ambermd.org>
>>> Sent: Thursday, May 07, 2009 7:59 PM
>>> Subject: Re: [AMBER] Error in PMEMD run
>>>
>>>
>>> Dear Ross and Bob,
>>>
>>> first of all thank you very much for your time and effort
>>> which really brought good result however some problem
>>> is still present ...
>>>
>>> OK,
>>>
>>>
>>> Our admin installed today ifort version 11 including corresponding cc,
>>> MKL.
>>>
>>> Here is actual LD_LIBRARY_PATH :
>>>
>>> LD_LIBRARY_PATH=/opt/intel/impi/3.2.0.011/lib64:/opt/intel/mkl/10.1.0.015/lib/em64t:/opt/intel/cc/11.0.074/lib/intel64:/opt/intel/fc/11.0.074/lib/intel64::/opt/intel/impi/3.2/lib64
>>>
>>> Of course first of all I tried just to compile pmemd with this new
>>> settings but I didn't succeeded :((
>>>
>>> Here is my configuration statement:
>>>
>>> ./configure linux_em64t ifort intelmpi
>>>
>>>
>>> Compilation started fine but after some time it "stopped" it means just
>>> progress stopped but not
>>> the compilation process it means after cca 1 hour the process is still
>>> alive see this part
>>> of the "top" list:
>>>
>>> 30599 mmaly 20 0 61496 16m 7012 R 50 0.1 58:51.64 fortcom
>>>
>>>
>>> But almost whole hour it was got jammed here:
>>>
>>> .....
>>>
>>> runmin.f90(465): (col. 11) remark: LOOP WAS VECTORIZED.
>>> runmin.f90(482): (col. 3) remark: LOOP WAS VECTORIZED.
>>> runmin.f90(486): (col. 3) remark: LOOP WAS VECTORIZED.
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC veclib.fpp veclib.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 veclib.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> gcc -c pmemd_clib.c
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC gb_alltasks_setup.fpp gb_alltasks_setup.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 gb_alltasks_setup.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC pme_alltasks_setup.fpp pme_alltasks_setup.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 pme_alltasks_setup.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC pme_setup.fpp pme_setup.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 pme_setup.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> pme_setup.f90(145): (col. 17) remark: LOOP WAS VECTORIZED.
>>> pme_setup.f90(159): (col. 22) remark: LOOP WAS VECTORIZED.
>>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>>> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC get_cmdline.fpp get_cmdline.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 get_cmdline.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC master_setup.fpp master_setup.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 master_setup.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> ib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
>>> -DBINTRAJ
>>> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
>>> -DFFTLOADBAL_2PROC pmemd.fpp pmemd.f90
>>> ifort -c -auto -tpp7 -xP -ip -O3 pmemd.f90
>>> ifort: command line remark #10148: option '-tp' not supported
>>> <<<<-HERE IS THE LAST LINE OF THE COMPILATION PROCESS
>>>
>>> after this one hour I killed compilation and obtained this typical
>>> messages:
>>>
>>> make[1]: *** Deleting file `pmemd.o'
>>> make[1]: *** [pmemd.o] Error 2
>>> make: *** [install] Interrupt
>>>
>>> I really do not understand how it is possible that compiler is using
>>> 50%
>>> of CPU for one hour and get jammed in one line ...
>>>
>>> I have to say that compilation with old version of ifort package was
>>> question of some minutes.
>>>
>>> It seems to me as an typical case of "infinity" loop ...
>>>
>>> But nevertheless then I got the idea just use the old pmemd compilation
>>> with the new installed libraries (cc ...) and it works :)) !!!
>>>
>>> Another situation was with SANDER but after compleet recompilation of
>>> Amber Tools and Amber, everything is OK ( at least now :)) ).
>>>
>>> So I think that my problem is solved but still is here some strange
>>> question about impossiblity to finish in real time compilation
>>> of the PMEMD with ifort11 package. Just to be complex I have to say,
>>> that
>>> I tried pmemd instalation with original "configure" file
>>> but also with Bob's late night one. The result is the same.
>>> Fortunately it
>>> is not crucial problem for me now ...
>>>
>>> So thank you both again !!!
>>>
>>> Best,
>>>
>>> Marek
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dne Thu, 07 May 2009 01:04:10 +0200 Robert Duke <rduke.email.unc.edu>
>>> napsal/-a:
>>>
>>>> Oh, very good find Ross; I have not had the experience of mixing
>>>> these, but I bet you are right! - Bob
>>>> ----- Original Message ----- From: "Ross Walker"
>>>> <ross.rosswalker.co.uk>
>>>> To: "'AMBER Mailing List'" <amber.ambermd.org>
>>>> Sent: Wednesday, May 06, 2009 5:53 PM
>>>> Subject: RE: [AMBER] Error in PMEMD run
>>>>
>>>>
>>>>> Hi Marek,
>>>>>
>>>>>> here is the content of the LD_LIBRARY_PATH variable:
>>>>>>
>>>>>> LD_LIBRARY_PATH=/opt/intel/impi/3.1/lib64:/opt/intel/mkl/10.0.011/lib/e
>>>>>> m64t:/opt
>>>>>> /intel/cce/9.1.043/lib:/opt/intel/fce/10.1.012/lib::/opt/intel/impi/3.1
>>>>>> /lib64
>>>>>
>>>>> I suspect this is the origin of your problems... You have cce
>>>>> v9.1.043
>>>>> defined and fce v10.1.012 defined. I bet these are not compatible.
>>>>> Note
>>>>> there is a libsvml.so in /intel/cce/9.1.043/lib/ and this comes
>>>>> first in
>>>>> your LD path so will get picked up before the Fortran one. This is
>>>>> probably
>>>>> leading to all sorts of problems.
>>>>>
>>>>> My advice would be to remove the old cce library spec from the path
>>>>> so it
>>>>> picks up the correct libsvml. Or upgrade your cce to match the fce
>>>>> compiler
>>>>> version - this should probably always be done and I am surprised
>>>>> Intel let
>>>>> you have mixed versions this way but alas..... <sigh>
>>>>>
>>>>> All the best
>>>>> Ross
>>>>>
>>>>>
>>>>> /\
>>>>> \/
>>>>> |\oss Walker
>>>>>
>>>>> | Assistant Research Professor |
>>>>> | San Diego Supercomputer Center |
>>>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>>>
>>>>> Note: Electronic Mail is not secure, has no guarantee of delivery,
>>>>> may not
>>>>> be read every day, and should not be used for urgent or sensitive
>>>>> issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> __________ Informace od NOD32 4051 (20090504) __________
>>>>
>>>> Tato zprava byla proverena antivirovym systemem NOD32.
>>>> http://www.nod32.cz
>>>>
>>>>
>>>
>>
>

--
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed May 20 2009 - 15:13:19 PDT
Custom Search