Re: [AMBER] Fully random MPI error after updating CentOS from release 6.5 to 6.6

From: Jason Swails <jason.swails.gmail.com>
Date: Mon, 03 Nov 2014 07:36:57 -0500

On Mon, 2014-11-03 at 11:54 +0100, Massimiliano Porrini wrote:
> Hi everyone,
>
> I have been encountering a weird (considering my poor MPI knowledge)
> problem with sander.MPI of Amber12.
>
> As the subject says, after updating CentOS from the release 6.5 to
> the 6.6 one (command: yum update), I get in a totally random fashion
> the MPI error reported below. By totally random I mean that this error
> sometimes occurs and sometimes does not.
>
> It must be added that the problem has been happening on 7 out of 9 blades,
> all with
> identical installation of Amber12 and MPICH 3.1 (with regard to the
> remaining 2 blades,
> I have not yet checked if this issue appears, but I am quite sure it would).
>
> Any suggestion/comment to get this problem sorted out is very welcome.
>
> Thanks in advance,
> Massimiliano
>
>
>
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(467)..............:
> MPID_Init(177).....................: channel initialization failed
> MPIDI_CH3_Init(70).................:
> MPID_nem_init(319).................:
> MPID_nem_tcp_init(171).............:
> MPID_nem_tcp_get_business_card(418):
> MPID_nem_tcp_init(377).............: gethostbyname failed, vg02 (errno 1)

Looks like one of the nodes is having a hard time connecting to the
machine "vg02". See the SO thread describing a similar error message
here:
http://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 7942 RUNNING AT vg02
> = EXIT CODE: 1
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
>
> Error opening unit 30: File "../Dynamics_3/substate81_dyn3.rst" is
> missing or unreadable

This is a fairly straightforward error message. The file
"../Dynamics_3/substate81_dyn3.rst" could not be found. If you are on a
cluster and you are sure it exists on the node you submitted the job
from, it's possible that the filesystem it resides on is not mounted on
the compute nodes.

These problems look more like cluster configuration issues than anything
related to Amber.

Good luck,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 03 2014 - 05:00:02 PST
Custom Search