[AMBER] 答复: [ambermd.org代发]Re: Max relative errors questions in parallel and cuda.serial test

From: 石谷沁 <guqin.shi.qilu-pharma.com>
Date: Wed, 28 Apr 2021 00:54:26 +0000

Hi All,


Thought I need to bring this up again...I encountered large CUDA test error with DPFP precision mode. The maximum relative errors are usually at the level of e-01.

Below I attached .diff and .log from test_amber_cuda and test_at_cuda, respectively.
https://drive.google.com/drive/folders/1wAUz_8538PwKDSVbM1ZR4wuDQsuybbKG?usp=sharing

And the forwarded info are my hardware sys and how I install the AMBER20.

Since I'm really in a hurry in running a few MD jobs. Would anyone have an idea if I could run classical MD with SPFP mode (which is the default) safely? Thank you!


Best,
Guqin

===============================================================================================================
Hi David,

Thanks for the reply. I supplemented my system in another reply and it probably drowned in the messages.

Here is my system:
CentOS 7, 32 Xeon Gold cpus .3.3GHz, 1 Quadro RTX 5000 CUDA toolkit is 11.1

I followed the instructions on manual.
According to https://ambermd.org/pmwiki/index.php/Main/CMake-Quick-Start. Cmake3 is recommended for CentOS 7. So I installed cmake3 and use cmake3 instead of cmake. For eg, in run_make script, cmake is changed to cmake3. In my system, a symbolic link of cmake is created directing to cmake3 in /usr/bin.

I installed devtoolset-8 to make sure all gnu compilers are version with 8.3.1.
I downloaded mpich3.3.2 package for parallel installation.

These are all the changes I've made for AMBER20 serial installation. For parallel and GPU installation, nothing changed except for turning -DMPI and -DCUDA flag to TRUE, respectively.

I hope these could help. Please let me know what else you need for troubleshoot The differences are quite large in GPU DPFP tests...I'm not sure if I can run any MD now...with default SPFP...?


Thanks a lot for the help!
-Guqin

=================================================================================================================


-----邮件原件-----
发件人: David A Case [mailto:david.case.rutgers.edu]
发送时间: 2021年4月24日 21:32
收件人: AMBER Mailing List <amber.ambermd.org>
主题: [ambermd.org代发]Re: [AMBER] Max relative errors questions in parallel and cuda.serial test

On Thu, Apr 22, 2021, 石谷沁 wrote:

>I double-checked the cuda.serial log. The possible failures all came out at DPFP:
>==============================================================
>cd myoglobin/ && ./Run_md_myoglobin_igb7 DPFP yes
>Note: The following floating-point exceptions are signalling:
>IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL diffing
>myoglobin_md_igb7.out.GPU_DPFP with myoglobin_md_igb7.out possible
>FAILURE: check myoglobin_md_igb7.out.dif

This is concerning.

>
>According to manual, SPFP is the default precision model for
>pmemd.cuda. And I didn’t get any failures report on SPFP. So I guess I
>can ignore those errors with DPFP in most general situations?

Again, very odd: the expectation is that DPFP tests will all pass, when compared to a CPU output, but that a fair number of SPFP tests will fail with roundoff errors.

So, I think you have genuine problems. Can you say (again?) what GPU you have, and what CUDA SDK version? Maybe the GPU gurus on the list (I am not one of them) will spot something.

Also, please say what changes you made to run_cmake (if you used that), and what command(s) you used to run the tests.

....dac


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber


***********免责声明*************

本电子邮件中包含的信息仅供指定的或授权的个人或团体使用。本电子邮件及附件中提到的信息可能是保密信息或者法律特许保密的信息。如果你不是指定收件人,对于邮件内容的任何披露、复制、散布或者任何针对邮件内容进行的行为都是违法行为,需要严格禁止。如果您误收该电子邮件,请立即通知本公司并从您的系统中删除全部原始信息。该邮件可能会对您的系统或者数据造成损坏,对此我公司不承担任何责任。除非与公司业务有关,否则本邮件中的观点、结论、或者其它包含在邮件中的信息均为发件人个人行为,并不代表我公司。我公司有权保留对收发邮件的监控权利。

***********Business Email Disclaimer**************

 This e-mail and any attachments are meant for the intended recipient only and may contain information belonging to Qilu Pharma that is privileged, confidential, proprietary, and/or otherwise protected or prohibited from disclosure. If you are not the correct recipient or received this e-mail erroneously, please inform the sender immediately and delete this mail from your system. Qilu Pharma state no liability for any damage to your system and data caused by this email. Unless this email is related to the business with the company, otherwise any views or opinions presented in this email are solely from the sender. Qilu Pharma has the right to monitor the sending and receiving of the e-mail.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 27 2021 - 18:00:03 PDT
Custom Search