Re: [AMBER] sander.MPI problem

From: peker milas <pekermilas.gmail.com>
Date: Fri, 21 May 2010 08:47:45 -0400

Thank you Gustavo,

Actually this was not a new problem i faced with it before (like 3-4
months ago). I thought i fixed it but apparently it shows up again.
The unfortunate thing was after weeks of debugging and everything me
and my collaborator decided that this sander.MPI may have race
condition type bug. Of course it s very hard to figure out and we
couldn't verify if it was really a race condition. We sent an email to
this group and nobody answered. Anyway i installed locally
openmpi1.4.1 and 1.4.2 and i m currently re-configuring amber for
them. If i can find the bug this time or if i can fix it with those
new versions of openmpi i will let you know. The very unfortunate and
frustrating thing is it looks like nobody needs to know about this bug
and nobody needs to fix it.

thank you so much again
peker milas

On Thu, May 20, 2010 at 4:35 PM, Gustavo Seabra
<gustavo.seabra.gmail.com> wrote:
> Hi Peter,
>
> I have experienced exactly the same symptoms. It appears to be related
> to a bug involving OpenMPI and Ubuntu 9.10, as described here:
>
> https://bugs.launchpad.net/ubuntu/+source/openmpi/+bug/504659
>
> Apparently, OpenMPI v 1.4.1 works, but that's not what available from
> apt-get in Ubuntu, I believe. I haven't tried installing OpneMPI 1.4.1
> yet.
>
> HTH,
> --
> Gustavo Seabra
> Professor Adjunto
> Departamento de Química Fundamental
> Universidade Federal de Pernambuco
> Fone: +55-81-2126-7417
>
>
>
> On Thu, May 20, 2010 at 5:22 PM, peker milas <pekermilas.gmail.com> wrote:
>> Thank you so much for your response,
>>
>> here is the detailed information;
>>
>> Hardware: Two intel Nehalem processors total 8 physical 16 logical cores
>> OS: Ubuntu Karmic Koala with gcc4.4.1 and gfortran.
>> Parallelization: OpenMPI 1.4
>>
>> I applied all bugfixes, AmberTools installation and tests were fine
>> just two minor failures. Amber serial installation was also fine and
>> there was just 4 rounding-off type failures. As already discussed at
>> different times i created a symbolic link from /bin/bash to /bin/sh.
>> Parallel installation was fine again. Parallel tests gave me same
>> failures with serial test when i used 2 processors. After that i tried
>> with 4 and 8 processors, unfortunately in both cases tests stalled at
>> different tests. I mean results are totally not reproducible. At once
>> they stalled at Run.cap, next time at Run.tip4p_nve another time at
>> Run.dip and so on...I have two locally installed OpenMPI versions as
>> OpenMPI1.4 and OpenMPI1.2.8 such that i linked their lib and bin
>> folders to PATH and LD_PATH manually in my .bashrc file, i tried both,
>> nothing has changed. Also i was using PCGAMESS in parallel mode before
>> and even if i used it with 8 processors it worked just fine. As a last
>> piece of information i had from all above stalled processes, they are
>> all belong to sander.MPI. One last thing to say i cancelled all PIMD
>> tests because i wouldn't use them and what i explained above for all
>> the other tests.
>>
>> thank you so much
>> peker milas
>>
>> On Thu, May 20, 2010 at 3:56 PM, Jason Swails <jason.swails.gmail.com> wrote:
>>> Hello,
>>>
>>> This doesn't provide much information helpful for debugging.  What are your
>>> system specs (OS, compilers, etc.)?  What test is it specifically failing on
>>> (or stalling on)?  Did the serial tests pass?  Have you applied all bug
>>> fixes?  The more details we have regarding system setup, the better chance
>>> someone will be able to help.
>>>
>>> All the best,
>>> Jason
>>>
>>> On Thu, May 20, 2010 at 3:19 PM, peker milas <pekermilas.gmail.com> wrote:
>>>
>>>> Dear Amber user and developers,
>>>>
>>>> My parallel (with openmpi1.4) Amber10 installation has a strange
>>>> problem. Let me try to explain it briefly, if run parallel tests with
>>>> only 2 processors (mpirun -np 2) everything goes fine except a couple
>>>> of failures. If i run them with 4 or more than 4 processors (mpirun
>>>> -np 4) it stalls in an arbitrary test. My computer has 8 physical cpu
>>>> s and it has shared memory parallelization. I used it for other
>>>> programs and there was no problem. So, really need help about this
>>>> number of processor issue and any will be greatly appreciated.
>>>>
>>>> thank you so much
>>>> peker milas
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 21 2010 - 06:00:15 PDT
Custom Search