Re: [AMBER] Troubleshooting long idle times in pmemd.MPI

From: Kevin Keane <kkeane.sandiego.edu>
Date: Sat, 1 May 2021 11:42:39 -0700

Thank you!

Yes, I can confirm that RedHat has the same problematic version of OpenMPI.
It's ancient, and probably also does not support your resource manager.
Don't use it.

I'll ask the researcher to re-run a job with the serial version. We
compiled our own version of OpenMPI (3.1 in this case; we also have 3.0 and
4.0 on the system). We did it slightly differently from the suggested
method; OpenMPI also supports building our own RPM.

The previous system also had our own build of OpenMPI.

Meanwhile, is there a way to profile pmemd.MPI to see where it actually may
be stalled?

Thanks!

_______________________________________________________________________
Kevin Keane | Systems Architect | University of San Diego ITS |
kkeane.sandiego.edu

*Pronouns: he/him/his*Maher Hall, 162 |5998 Alcalá Park | San Diego, CA
92110-2492 | 619.260.6859 | Text: 760-721-8339

*REMEMBER! **No one from IT at USD will ever ask to confirm or supply your
password*.
These messages are an attempt to steal your username and password. Please
do not reply to, click the links within, or open the attachments of these
messages. Delete them!




On Sat, May 1, 2021 at 10:02 AM David A Case <david.case.rutgers.edu> wrote:

> On Thu, Apr 29, 2021, Kevin Keane wrote:
>
> >We recently upgraded our cluster from RedHat 7 to RedHat 8, and we use
> >Amber 16/AmberTools 17. We recompiled Amber during the migration.
> >
> >Ever since the upgrade, we noticed that Amber jobs take dramatically
> longer
> >than they used to.
> >
> >The CPU times remain normal (in the example below, about 9 minutes), but
> >the wall time needed was about 70 minutes. This job ran on two CPUs...
>
> Do you see the same degradation on serial jobs? That might help provide a
> clue.
>
> I'm not sure how relevant this is, but the supplied MPI libraries for
> CentOS
> don't work well with Amber, and we recommend buidling MPI from source
> there:
>
> https://ambermd.org/InstCentOS.php
>
> It's possible that similar limitations apply to RedHat as well. (We don't
> have a regular testing scheme for RedHat.)
>
> ....dac
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat May 01 2021 - 12:00:02 PDT
Custom Search