Re: [AMBER] pmemd.MPI of amber14: run MD twice with the same input but got different output (within statistical error) from sun on 2014-08-04 (Amber Archive Aug 2014)

From: sun <sun.ntnu.edu.tw>
Date: Tue, 5 Aug 2014 01:06:14 +0800

On Mon, 04 Aug 2014 09:23:06 -0700, Ross Walker wrote
> Hi Ying-chieh,
>
> It's not clear what you are asking here - or what you actually want to do.
> Are you trying to rerun the exact same simulation?

- Yes. and at the first place, we expected to got exact the same results but did not. so we digged into old Amber Q&A and found those two possible causes in the last email.

- What we did was run 10 ps NTV and followed by 200 ps NTP simulations.

- the very first frame gave the same energy, but after 5 ps (we only printed every 5 ps in this case), the energies differed. The numbers were reasonable though.

- after 200 NTP run, "Largestsphere to fit in unit cell has radius, Direct force subcell size and BOX X, Y, Z" in the output files are slightly different.

- I think these results are reasonable then.

> if yes this is NOT
> possible with the CPU MPI code due to the load balancer.

- Got it. I don't have knowledge of load balancer. ... need to study this more.

> You can either 1)
> Hack the code to turn off the loadbalancer, you may still get divergence
> due to order of summation issues in other parts of the code, vector
> libraries etc. 2) use the serial CPU code, 3) use the GPU code - this is
> deterministic for both single and multi-GPU.

- OK.

- Ross, thanks for prompt reply.

- Best.

- Ying-chieh
>
> The FFT is completely controllable through the ewald namelist - you can
> set the FFT dimensions. NFFT1, NFFT2, NFFT3. You can also specify that it
> should always use a slab fft instead of switching to columns at a set MPI
> task count - see the PMEMD section of the manual.
>
> With more details about what you actually want to do we can try to help
> more.
>
> All the best
> Ross
>
> On 8/4/14, 8:39 AM, "sun" <sun.ntnu.edu.tw> wrote:
>
> >Hi,
> >
> >I know this is not new, was discussed years back, and is NOT considered a
> >problem. But further comments will help us more.
> >
> >I think in our case, possible causes are:
> >
> >1). order of addition of forces are different due to network
> >indeterminacy, and the rounding error causes the difference.
> >2). some verion of FFT which has feature of automatic algorithm selection
> >depending on network condition is used in PME calculation.
> >
> >For 1), we are running with 24 cores on the CPUs of 12 cores each.
> >
> >Further comments are appreaciated.
> >
> >For 2), can someone describe more or point out where ther FFT source code
> >is?
> >
> >Thank you very much.
> >
> >Ying-chieh
> >
> >
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 04 2014 - 10:30:02 PDT