[AMBER] parallel cuda test errors

From: Gard Nelson <Gard.Nelson.NantBio.com>
Date: Wed, 3 Sep 2014 19:49:38 +0000

Hi all,



I've recently installed Amber14 on my local cluster. The serial and parallel CPU versions both pass all of the included tests without any errors or failures. The serial GPU version reports a few possible failures, but manual inspection shows that these are all infrequent and likely harmless. (maximum relative errors =< 1e-3) The parallel GPU code passes the tests (similar to the serial GPU version) if I use 2 GPUs. However, when I run the same tests with 4 GPUs I see frequent differences with relative errors around 1-2. This often corresponds to energy differences on the order of tens to hundreds of kcal/mol.



I realize that the highly parallel nature of GPU calculations will result in test differences, but what I'm seeing seems too large to be caused by order of operations or round off errors. Does anyone have any idea what could be causing this behavior?



I'm running this on Tesla S2050 GPUs with driver version 331.62. The code was built with gnu 4.8 and CUDA 6.0 compilers.



Thanks for your help,

Gard

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 03 2014 - 13:00:03 PDT
Custom Search