Thanks for replying Jason.
Below my comments, in case they could be useful to solve this issue.
On 3 November 2014 13:36, Jason Swails <jason.swails.gmail.com> wrote:
> On Mon, 2014-11-03 at 11:54 +0100, Massimiliano Porrini wrote:
> > Hi everyone,
> >
> > I have been encountering a weird (considering my poor MPI knowledge)
> > problem with sander.MPI of Amber12.
> >
> > As the subject says, after updating CentOS from the release 6.5 to
> > the 6.6 one (command: yum update), I get in a totally random fashion
> > the MPI error reported below. By totally random I mean that this error
> > sometimes occurs and sometimes does not.
> >
> > It must be added that the problem has been happening on 7 out of 9
> blades,
> > all with
> > identical installation of Amber12 and MPICH 3.1 (with regard to the
> > remaining 2 blades,
> > I have not yet checked if this issue appears, but I am quite sure it
> would).
> >
> > Any suggestion/comment to get this problem sorted out is very welcome.
> >
> > Thanks in advance,
> > Massimiliano
> >
> >
> >
> > Fatal error in MPI_Init: Other MPI error, error stack:
> > MPIR_Init_thread(467)..............:
> > MPID_Init(177).....................: channel initialization failed
> > MPIDI_CH3_Init(70).................:
> > MPID_nem_init(319).................:
> > MPID_nem_tcp_init(171).............:
> > MPID_nem_tcp_get_business_card(418):
> > MPID_nem_tcp_init(377).............: gethostbyname failed, vg02 (errno 1)
>
> Looks like one of the nodes is having a hard time connecting to the
> machine "vg02". See the SO thread describing a similar error message
> here:
> http://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed
>
>
The link at first glance might seem useful, however I am *not* running on a
cluster, but on a blade server
whose name is vg02.
vg02 has 2 sockets, with 8 cores each and 2 threads per
core: the total number of processors is then 32 (I am using 24 processors,
as
this was tested to be the number of processors to be used to get the
highest speed for
pmemd.MPI and sander.MPI). Thus everything has been run "in situ", with
no connection to any machine/node via ssh or whatever.
>
> >
> ===================================================================================
> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > = PID 7942 RUNNING AT vg02
> > = EXIT CODE: 1
> > = CLEANING UP REMAINING PROCESSES
> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> ===================================================================================
> >
> > Error opening unit 30: File "../Dynamics_3/substate81_dyn3.rst" is
> > missing or unreadable
>
> This is a fairly straightforward error message. The file
> "../Dynamics_3/substate81_dyn3.rst" could not be found. If you are on a
> cluster and you are sure it exists on the node you submitted the job
> from, it's possible that the filesystem it resides on is not mounted on
> the compute nodes.
>
You are right, I should not have pasted here this part of the error message,
as this is a consequence of the first MPI error: my simulation is divided
into 50 ns chunks,
and the 3rd chunk (Dynamics_3) cannot find the final restart file of the
2nd chunk because of the
MPI aborting, that's it.
> These problems look more like cluster configuration issues than anything
> related to Amber.
>
What I will try to do next is to reinstall Amber12 and MPICH 3.1 from
scratch
on one blade server, to see if this can fix the error.
>
> Good luck,
> Jason
>
Thanks,
Max
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
Dr Massimiliano Porrini
Valérie Gabelica Team
U869 ARNA - Inserm / Bordeaux University
Institut Européen de Chimie et Biologie (IECB)
2, rue Robert Escarpit
33607 Pessac Cedex
FRANCE
Tel : 33 (0)5 40 00 63 31
http://www.iecb.u-bordeaux.fr/teams/GABELICA
Emails: massimiliano.porrini.inserm.fr
m.porrini.iecb.u-bordeaux.fr <m.porrini.iecb.u-bordeaux.fr>
mozz76.gmail.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 03 2014 - 08:30:03 PST