On Fri, Aug 26, 2022, Rybenkov, Valentin V. via AMBER wrote:
>
>I have an odd problem. The same job for pmemd.MPI crashes on one computer
>but not another. This is a new installation of Amber22 on CentOS 7.9. The
>node that crashes is a better cpu, 128 cores with 125 GB memory. Nodes that
>work are 20-32 cores with ~40 GB memory. Diagnostics on the failing machine
>does not reveal problems. Crashes happen at various times when the job is
>rerun, but seem to occur whenever it is time to write data to disc, not
>necessarily for the first time. At low tasks_per_node (e.g. =6), crashes
>happen later. Could this difference in memory per core be an issue? Is
>there a parameter in Amber that controls how much memory per core is
>allocated?
Use the "limit" command to check your StackSize limit; Using "unlimit
stacksize" (exact command may vary with your OS and shell) can often help
prevent segfaults.
If this doesn't help, can you provide more info on exactly what messages you
see when the program "crashes"? Does this happen during the middle of a
run, or (say) at or before the first step? Are you using the same number of
MPI threads on each machine? Is there any indication of bad energetics
before the crash occurs? Do the simulations appear to be successful using
the serial version of pmemd?
...thx...dac
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 26 2022 - 06:00:06 PDT