Re: [AMBER] pmemd.cuda.MPI on Comet- MPI dying

From: Ross Walker <rosscwalker.gmail.com>
Date: Thu, 22 Oct 2015 09:16:53 -0700

Hi Kenneth,

> Yes, both the inputs and systems themselves are almost identical- 06B has a
> ligand that 06A doesn't have, so the only difference in the inputs is the
> nmr restraint file that they refer to.
>

So they are not the same. There is no such thing as 'almost' identical. Same as there is no such thing as 'almost' unique. The terms identical and unique are absolute adjectives. They can be true or false but nothing in between. The same is true of the word 'perfect' - although I note that even the US constitution gets this wrong with the phrase "..in order to form a more perfect union..."

First thing first is to run jobs with 'IDENTICAL' input on both sets of GPUs. If you see it fail on one set but not the other then it means it is a machine configuration issue / bios / etc and I can escalate it to SDSC support.

If it fails on both (or runs fine on both) then it says it is something with your job and we can attempt to find if there is a bug in the GPU code or something weird about your input. To do this though I need input that fails on any combination of 2 GPUs.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Associate Research Professor |
| San Diego Supercomputer Center |
| Adjunct Associate Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not be read every day, and should not be used for urgent or sensitive issues.


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 22 2015 - 09:30:04 PDT
Custom Search