Re: [AMBER] Intel Compilers, SSE_TYPES, and auto CPU dispatch

From: Novosielski, Ryan <novosirj.ca.rutgers.edu>
Date: Wed, 14 Jan 2015 11:05:29 -0500

Well, as it turns out, the only hassle is that confusing -xHost define being in there. Otherwise it is very easy to get a binary that theoretically should perform best on whatever you specify (Intel compiler only obviously). I would have an easy time running some tests to see if there is any performance difference between the permutations, since I've built Amber so many times in the last week or so. I can share the results with the list. I'm thinking I'll test configure -nosse, setting -xHost to -xSSE3 and using -axSSE4.1,SSE4.2 and setting -xSSE4.1 and -xSSE4.2.

I think I consider a couple percent worth it considering the solution is just changing a couple of -xHost to -xSSE3 or whatever else you pick.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu>- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Jan 14, 2015, at 10:58, Brent Krueger <kruegerb.hope.edu<mailto:kruegerb.hope.edu>> wrote:

Ryan,

I did test the performance difference at one point and I think it is pretty
small -- just a couple percent. Only you can decide whether it is truly
worth all of the hassle.

I should say that I did that measurement quite some time ago, so probably
with AMBER12 or maybe even AMBER11. AMBER14 might take advantage of the
newer SSE instructions to a larger extent and maybe the performance gain is
more significant now.

Brent



On Wed, Jan 14, 2015 at 10:33 AM, Novosielski, Ryan <novosirj.ca.rutgers.edu<mailto:novosirj.ca.rutgers.edu>
wrote:

I had thought of that, but I would have the same question there though:
wouldn't the code compiled with SSE4.2 run faster on the machines that
support it? After all, I am trying to get the maximum performance out of
this stuff. But I am not advanced enough to know whether we are talking
about any serious gain between those two instruction sets. Compiling with
-xSSE4.1 was going to be my fallback. But I am happy to report that setting
SSE_TYPES and the resultant -ax flags does seem to work fine provided it is
not being defeated by -xHost. It is clear something should be changed here,
because otherwise the behavior is counterintuitive.

In any case, on our cluster, the machine with the least hardware features
does not have a full build environment. It is just a diskless node.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu><mailto:novosirj.rutgers.edu>-
973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
   `'

On Jan 14, 2015, at 07:27, Jan-Philip Gehrcke <jgehrcke.googlemail.com<mailto:jgehrcke.googlemail.com>
<mailto:jgehrcke.googlemail.com>> wrote:

Let me propose another quite simple and yet efficient solution on a
heterogeneous cluster:

Compile on the machine with the least hardware features w/o changing any
of the configure options (i.e. using xHost).

The resulting build works and uses the intersection of advanced hardware
features on all machines in your cluster (SSE 4.1 in your case). This is
what I have been doing many times now, and it really saves a lot of work.

JP

On 13.01.2015 18:20, Novosielski, Ryan wrote:
Wouldn't there be performance tradeoffs there? I'd think that at least
supporting 4.1, in my case, would result in faster execution.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu><mailto:novosirj.rutgers.edu
<mailto:novosirj.rutgers.edu>- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
   `'

On Jan 12, 2015, at 20:45, Daniel Roe <daniel.r.roe.gmail.com<mailto:daniel.r.roe.gmail.com><mailto:
daniel.r.roe.gmail.com<mailto:daniel.r.roe.gmail.com>><mailto:daniel.r.roe.gmail.com>> wrote:

Hi,

Personally, on clusters like that I usually just configure with the
'-nosse' flag to avoid the issue altogether.

-Dan

On Monday, January 12, 2015, Novosielski, Ryan <novosirj.ca.rutgers.edu<mailto:novosirj.ca.rutgers.edu>
<mailto:novosirj.ca.rutgers.edu><mailto:novosirj.ca.rutgers.edu>>
wrote:

Hi all,

I recently ran into the following error at runtime:

"Please verify that both the operating system and the processor support
Intel(R) SSE4_2 and POPCNT instructions."

The reason for this is that we have some nodes with Nehalem chipsets and
some with Harpertown chipsets on our cluster, and the master node is a
newer machine that supports some of the newer instructions. Looking through
the —full-help configure option, I saw that I could define SSE_TYPES to say
which CPU instructions should be supportable at runtime. However, I tried
building with SSE types set to SSE4.2,SSE4.1 and the reverse even though I
doubted that would make any difference. It appears as if it does add
-axSSE4.2,SSE4.1 to the appropriate places during the build, which from my
read of the Intel documentation is exactly what it should do. However, when
built, the code behaves the same way with the same error messages on the
Harpertown nodes. I don’t see a way to disable POPCNT, but it appears as if
SSE4.1 machines don’t support it, so I would assume it is not used when the
runtime dispatcher selects SSE4.1.

I was not aware of auto CPU dispatch before today, so I’m not sure if I’m
doing something wrong somehow. Amber version is 14, Intel compilers are
15.0.1.


____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj.rutgers.edu<mailto:novosirj.rutgers.edu><mailto:novosirj.rutgers.edu
<mailto:novosirj.rutgers.edu> <javascript:;> - 973/972.0922
(2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
   `'

_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org>
<javascript:;>
http://lists.ambermd.org/mailman/listinfo/amber



--
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org><mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber




--
_______________________________________________
Brent P. Krueger.....................phone: 616 395 7629
Professor................................fax: 616 395 7118
Hope College..........................Schaap Hall 2120
Department of Chemistry
Holland, MI 49423
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jan 14 2015 - 08:30:03 PST
Custom Search