Dear Ross,
Thanks for your suggestions.
I initially didn't mention our Fedora Core nodes were running OpenMosix
(2.4.24), which we found out was certainly part of the problem! We
eventually switched from OpenMPI to MPICH2 (compiled with
--enable-threads=single) and stopped OpenMosix on our nodes. This has
been our stablest configuration so far.
I am now testing another configuration using the latest OpenMosix
(2.4.26) under CentOS 3.8, which looks fine too!
Christophe
Ross Walker wrote:
> Dear Christophe,
>
> This is my first experience with openmpi. Which openmpi test suite are
> you refering to? Where is it documented?
> I have never used Openmpi myself either. I tend to use mpich2. There
> should be some kind of test suite distributed with the source code
> though. Check the install docs. Typically you do something like:
> ./configure; make; make test; make install
>
> It is the make test bit that you need to lookup.
>
> Unfortunately, the error is not always from the same node!
>
> HHmmm, then it could be the switch but could also be an issue with the
> openmpi installation. Try downloading mpich2 and trying that out
> instead and see if it works.
>
> You could also try building pmemd in $AMBERHOME/src/pmemd and then
> testing this. If you see similar problems then it is definately an
> issue with the openmpi installation or the hardware.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ------------------------------------------------------------------------
> From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On
> Behalf Of Christophe Deprez
> Sent: Thursday, October 12, 2006 06:55
> To: amber.scripps.edu
> Subject: Re: AMBER: problems for running sander.MPI
>
> Ross Walker wrote:
>
>>Hi Qizhi
>>
>>>enode05:03662] mca_btl_tcp_frag_send: writev failed with errno=104
>>>
>>>(enode05 is one of the node names of the cluster.)
>>>
>>>Normmally, there is no problem for minimization and constant
>>>NVT steps.
>>>The problems often occur during constant NPT and production run.
>>>
> Hi Ross, and thanks for your reply.
> I'm working as sysadmin with Qizhi to troubleshoot this issue.
>
>>This looks like a hardware problem to me. Unfortunately a Google search
>>sheds little light. E.g.:
>>http://www.open-mpi.org/community/lists/users/2006/02/0684.php
>>
>>Have you seen this with any other codes? Can you run the openmpi test suite
>>successfully?
>>
> This is my first experience with openmpi. Which openmpi test suite
> are you refering to? Where is it documented?
>
>>I would check to see if the error is always from the same node. If you
>>unplug that node and use the remaining nodes do you see the problem.
>>
> Unfortunately, the error is not always from the same node!
>
>>I would also try compiling with g95 instead of gfortran. While it appears
>>that gfortran is now mature enough to compile Amber I don't know if it has
>>been thoroughly tested. You will probably have to recompile openmpi with g95
>>as well.
>>
> I'll give this a try.
>
> Thanks for your suggestions
>
--
Christophe Deprez christophe.deprez.bri.nrc.ca
----------------------------------------------------------------------
Institut de Recherche en Biotechnologies / Biotech. Research Institute
6100 Royalmount, Montréal (QC) H4P 2R2, Canada Tel: (514) 496-6164
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Nov 01 2006 - 06:07:18 PST