RE: AMBER: Problem of QM/MM calculation with amber 9 parallel version

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 2 Nov 2006 09:17:42 -0800

Dear Rafi,

The problem you are seeing is a very different bug. You see this with
regular classical MD not just related to QM/MM MD.

I really don't know what is causing the problem you are seeing although I
wouldn't be surprised if it was hardware related. I have also seen similar
weird unpredictable erros when people run load balancing software such as
mosix and then run mpi jobs over it. You should probably check what the
exact configuration of your system is and see if any such software, condor,
mosix etc is installed. I would also consider running some burn-in tests to
see if these work okay. A good set of examples are here:

http://www.clustermonkey.net//content/view/38/34/

I would seriously consider running the NAS Parallel tests to see if these
all work on your system. If any of these crash then it is certainly an issue
with either your hardware or MPI software stack. If these all pass then
indeed it could be a problem with Amber and we can try further to track it
down. The issue here is that these sorts of sporadic problems in parallel
are so difficult to track down that one should make sure the hardware and
system software is working perfectly before attempting to debug the user
software.

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

> -----Original Message-----
> From: owner-amber.scripps.edu
> [mailto:owner-amber.scripps.edu] On Behalf Of Rafi Ahmad
> Sent: Thursday, November 02, 2006 01:51
> To: amber.scripps.edu
> Subject: RE: AMBER: Problem of QM/MM calculation with amber 9
> parallel version
>
> Hi Lee and Ross,
>
> I have been following your discussion about the problem with running
> sander.mpi in amber 9.
>
> I am having the same problem and I posted this to the amber
> mailing list
> on 20.10.06.
>
> David Case replied to me and asked me to run the jac
> benchmarks which I
> did but the same problem exists again and it crashes after some steps.
>
> Regards
>
> Rafi
>
> -----Original Message-----
> From: owner-amber.scripps.edu
> [mailto:owner-amber.scripps.edu] On Behalf
> Of Ross Walker
> Sent: 2. november 2006 01:04
> To: amber.scripps.edu
> Subject: RE: AMBER: Problem of QM/MM calculation with amber 9 parallel
> version
>
> Dear Lee,
>
> >I installed parallel version of amber9 in intel itanium server
> >usaually called white box with intel compiler 9.
> >Installing amber9 went well without a error message.
> >When I runned QM/MM calculation with 2 process (mpich option
> is -np 2),
> >it's not working. For long time, sander don't stop itself and don't
> make
> any data.
>
> >In running with 1 cpu (mpich option is -np 1), sander are working
> normally.
>
>
> This is definately not right it shouldn't hang. I suspect this may be
> related to a bug I have been seeing with Intel's 9.1 compilers on my
> x86_64
> box with our amber10 development code. There was an issue
> with the Intel
> compiler generating incorrect machine code for a loop over MPI_sends
> that
> are in the ewald setup routines in the QM/MM code.
>
> A couple of questions. Can you post the input file that you are using
> for
> QM/MM - are you using ewald or pme?
>
> Can you post the output file up to the point where it hangs?
>
> Did you run the QM/MM test cases in parallel? Do the first test cases
> (crambin_2) run correctly but the later ones 1NLN_periodic_lnk_atoms
> hang?
> Or do they all hang?
>
> If it is the former then I have a potential work around that you can
> try,
> get back to me and let me know. If it is the later (i.e. they
> all hang)
> then
> I will have to investigate it further. Having the output file
> up to the
> point where the calculation hung would be very useful here.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may
> not
> be read every day, and should not be used for urgent or sensitive
> issues.
>
>
> --------------------------------------------------------------
> ---------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> --------------------------------------------------------------
> ---------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>


-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Nov 05 2006 - 06:07:30 PST
Custom Search