Thanks, dac! The outputs indicate different nodes (v006 and v011) but the same type of GPU (Tesla V100-SXM2-32GB). I’ve run a great deal of additional calculations and this issue happens very rarely. When I get a chance, I’ll do your suggested test of sending many repeats of this job to v006 and v011 to see how reproducible is the problem.
Best,
Matthew
> On Nov 28, 2022, at 9:22 AM, David A Case via AMBER <amber.ambermd.org> wrote:
>
> *Message sent from a system outside of UConn.*
>
>
> On Sun, Nov 27, 2022, Matthew Guberman-Pfeffer via AMBER wrote:
>>
>> I ran two identical calculations on (presumably different) GPUs and got
>> completely different results. In the first run, the system blew up; in the
>> second, everything looked fine.
>
> It's certainly possible that there could be some overflow/crash on one GPU
> but not on another. Given the fact that you seem not to know which GPU's
> were involved, it's not likely that any remote person will be able to help
> much.
>
> Try running the job 50 times, and see how many times it fails, and whether
> you always get the same output in the successful runs. The output files
> give lots of information about which GPU is being used, so see if there is
> some correlation there.
>
> ....dac
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&data=05%7C01%7C%7Cadc5be7833204a07936808dad14c0b8d%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C638052421780832012%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VuzldTUe%2BloMRLRwH6ZduXudjU1bOt3oe1egTXLuMao%3D&reserved=0
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 28 2022 - 08:00:02 PST