Re: [AMBER] sander.MPI problem

From: peker milas <pekermilas.gmail.com>
Date: Fri, 21 May 2010 15:33:53 -0400

One more thing i forgot to tell. Again after mpich2 installation from
repos, you may want to create a new folder and you can create symbolic
links of /usr/include, /usr/bin and, /usr/lib into that folder. Then
you can set this folder as $MPI_HOME. Because otherwise amber may not
be able to find those libmpich.a and mpif.h (or other related files).

best
peker

On Fri, May 21, 2010 at 3:30 PM, peker milas <pekermilas.gmail.com> wrote:
> Hello Gustavo,
>
> I have some good news and some bad news. I will start with bad news.
> From my past installation experience and from the results i recently
> had, starting from OpenMPI version 1.3.8 to 1.4.2, problem can appear
> anywhere among tests. It looks like it s not only an OpenMPI issues
> but also Ubuntu Karmic Koala issue. Non of the versions that i told
> above work fine unfortunately. Now, good news are; i installed mpich2
> from Ubuntu repos and then i configured amber with it. It worked fine.
> The only trick in configuration was; after i installed it, mpich2
> automatically created a ".mpd.conf" file with an  additional
> "MPD_USE_ROOT_MPD=1" line in it (second line from top). I comment it
> out then i run mpd. After that i installed parallel version of Amber.
>
> There is also one more thing i need to mention. Without having any
> doubts we previously found another race condition for at least one of
> the PIMD tests (specifically cd PIMD/full_cmd_water/equilib &&
> ./Run.full_cmd). Therefore i canceled those tests when i run them this
> time. It looks like it will fail again. Anyway, others are fine and
> reproducible (both successes and failures). I would like to thank you
> for you help again...
>
> best
> peker milas
>
> On Fri, May 21, 2010 at 11:28 AM, Gustavo Seabra
> <gustavo.seabra.gmail.com> wrote:
>> Hi Peker,
>>
>> Thanks a lot. I'll be waiting to know if it works for you.
>>
>> The problem with fixing this bug is that, as it happened when I first
>> reported it on the dev list, people just can't reproduce it (see the
>> thread here: http://dev-archive.ambermd.org/201005/0002.html). Apart
>> from a somewhat similar report from Lachele Foley (but with the
>> stalling happening always in the same point), no one else saw anything
>> similar. Some people there using the same system didn't see the
>> errors, so it becomes really hard to find. And, it may really be an
>> OpenMPI bug, not Amber's, since something similar has been reported
>> before by other MPI users, in programs other than Amber...
>>
>> Cheers,
>> Gustavo.
>>
>> On Fri, May 21, 2010 at 9:47 AM, peker milas <pekermilas.gmail.com> wrote:
>>> Thank you Gustavo,
>>>
>>> Actually this was not a new problem i faced with it before (like 3-4
>>> months ago). I thought i fixed it but apparently it shows up again.
>>> The unfortunate thing was after weeks of debugging and everything me
>>> and my collaborator decided that this sander.MPI may have race
>>> condition type bug. Of course it s very hard to figure out and we
>>> couldn't verify if it was really a race condition. We sent an email to
>>> this group and nobody answered. Anyway i installed locally
>>> openmpi1.4.1 and 1.4.2 and i m currently re-configuring amber for
>>> them. If i can find the bug this time or if i can fix it with those
>>> new versions of openmpi i will let you know. The very unfortunate and
>>> frustrating thing is it looks like nobody needs to know about this bug
>>> and nobody needs to fix it.
>>>
>>> thank you so much again
>>> peker milas
>>>
>>> On Thu, May 20, 2010 at 4:35 PM, Gustavo Seabra
>>> <gustavo.seabra.gmail.com> wrote:
>>>> Hi Peter,
>>>>
>>>> I have experienced exactly the same symptoms. It appears to be related
>>>> to a bug involving OpenMPI and Ubuntu 9.10, as described here:
>>>>
>>>> https://bugs.launchpad.net/ubuntu/+source/openmpi/+bug/504659
>>>>
>>>> Apparently, OpenMPI v 1.4.1 works, but that's not what available from
>>>> apt-get in Ubuntu, I believe. I haven't tried installing OpneMPI 1.4.1
>>>> yet.
>>>>
>>>> HTH,
>>>> --
>>>> Gustavo Seabra
>>>> Professor Adjunto
>>>> Departamento de Química Fundamental
>>>> Universidade Federal de Pernambuco
>>>> Fone: +55-81-2126-7417
>>>>
>>>>
>>>>
>>>> On Thu, May 20, 2010 at 5:22 PM, peker milas <pekermilas.gmail.com> wrote:
>>>>> Thank you so much for your response,
>>>>>
>>>>> here is the detailed information;
>>>>>
>>>>> Hardware: Two intel Nehalem processors total 8 physical 16 logical cores
>>>>> OS: Ubuntu Karmic Koala with gcc4.4.1 and gfortran.
>>>>> Parallelization: OpenMPI 1.4
>>>>>
>>>>> I applied all bugfixes, AmberTools installation and tests were fine
>>>>> just two minor failures. Amber serial installation was also fine and
>>>>> there was just 4 rounding-off type failures. As already discussed at
>>>>> different times i created a symbolic link from /bin/bash to /bin/sh.
>>>>> Parallel installation was fine again. Parallel tests gave me same
>>>>> failures with serial test when i used 2 processors. After that i tried
>>>>> with 4 and 8 processors, unfortunately in both cases tests stalled at
>>>>> different tests. I mean results are totally not reproducible. At once
>>>>> they stalled at Run.cap, next time at Run.tip4p_nve another time at
>>>>> Run.dip and so on...I have two locally installed OpenMPI versions as
>>>>> OpenMPI1.4 and OpenMPI1.2.8 such that i linked their lib and bin
>>>>> folders to PATH and LD_PATH manually in my .bashrc file, i tried both,
>>>>> nothing has changed. Also i was using PCGAMESS in parallel mode before
>>>>> and even if i used it with 8 processors it worked just fine. As a last
>>>>> piece of information i had from all above stalled processes, they are
>>>>> all belong to sander.MPI. One last thing to say i cancelled all PIMD
>>>>> tests because i wouldn't use them and what i explained above for all
>>>>> the other tests.
>>>>>
>>>>> thank you so much
>>>>> peker milas
>>>>>
>>>>> On Thu, May 20, 2010 at 3:56 PM, Jason Swails <jason.swails.gmail.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This doesn't provide much information helpful for debugging.  What are your
>>>>>> system specs (OS, compilers, etc.)?  What test is it specifically failing on
>>>>>> (or stalling on)?  Did the serial tests pass?  Have you applied all bug
>>>>>> fixes?  The more details we have regarding system setup, the better chance
>>>>>> someone will be able to help.
>>>>>>
>>>>>> All the best,
>>>>>> Jason
>>>>>>
>>>>>> On Thu, May 20, 2010 at 3:19 PM, peker milas <pekermilas.gmail.com> wrote:
>>>>>>
>>>>>>> Dear Amber user and developers,
>>>>>>>
>>>>>>> My parallel (with openmpi1.4) Amber10 installation has a strange
>>>>>>> problem. Let me try to explain it briefly, if run parallel tests with
>>>>>>> only 2 processors (mpirun -np 2) everything goes fine except a couple
>>>>>>> of failures. If i run them with 4 or more than 4 processors (mpirun
>>>>>>> -np 4) it stalls in an arbitrary test. My computer has 8 physical cpu
>>>>>>> s and it has shared memory parallelization. I used it for other
>>>>>>> programs and there was no problem. So, really need help about this
>>>>>>> number of processor issue and any will be greatly appreciated.
>>>>>>>
>>>>>>> thank you so much
>>>>>>> peker milas
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> Gustavo Seabra
>> Professor Adjunto
>> Departamento de Química Fundamental
>> Universidade Federal de Pernambuco
>> Fone: +55-81-2126-7417
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 21 2010 - 13:00:04 PDT
Custom Search