Re: [AMBER] H-REMD, pmemd fails to calculate potential energy of neighbor coordinates from Jiri Wiesner on 2012-12-16 (Amber Archive Dec 2012)

From: Jiri Wiesner <wiesner.chemi.muni.cz>
Date: Mon, 17 Dec 2012 00:06:27 +0100

Dear Jason,
thank you for your suggestions. Both binaries pass the REM tests,
including the h_rem test. I ran the tests with sander on 4, 8 and 16
CPUs and with pmemd on 8 and 16 CPUs.
I have updated my force-dumping code with the mpi_reduce call, see the
attached patches. Forces and coordinates are dumped only when the force
or pme_force subroutine is executed for the first time for the replica's
own coordinates (x1) and the coordinates of its first assigned neighbor
(x2). The mpi_reduce call does not seem to work in sander (it probably
has a different way it combines the force arrays), so I will just focus
on pmemd. The forces which were calculated on x1 are exactly the same,
when the first 14 significant digits are compared, for the runs on 32
and 64 CPUs. The forces calculated on x2 differ between the 32 CPU run
and 64 CPU run. The average absolute difference per array element is
circa 2.5 internal units. Dumps for replica 1 are attached. Anyway, I
think this is just a side effect not the cause of the failure which
occurs when pmemd uses 4 CPUs per replica.
Best Regards,
Jiri

On 16/12/12 16:20, Jason Swails wrote:
> On Sat, Dec 15, 2012 at 11:26 PM, Jiri Wiesner<wiesner.chemi.muni.cz>wrote:
>
>
>> Dear Amber developers and users:
>> I use the Hamiltonian replica exchange method to calculate the free energy
>> of the perturbation between the first and the last replica (REFEP). My
>> Amber source tree is updated by the latest set of patches (29 for
>> AmberTools, 13 Amber). I have compiled the sources by Intel compiler 12 and
>> I use OpenMPI 1.6.3. I have 16 replicas and I have tested both sander and
>> pmemd. I have tried to run sander on 16, 32 and 64 CPUs (all in a single
>> machine) and pmemd on 32 and 64 CPUs (sander can use 1 or more CPUs per
>> replica, pmemd only allows the utilization of 2 or more CPUs per replica).
>> My system is solvated, has PBC and thus PME is switched on.
>>
>>
> I have to ask -- does the $AMBERHOME/test/h_rem test pass with both sander
> and pmemd? You can check this explicitly by setting DO_PARALLEL='mpirun
> -np<2,4,8>" and running "make test.sander.REM" and "make test.pmemd.REM".
>
>
>
>> The problem is that whereas sander on 16 CPUs gives this result:
>> [snip]
>>
>>
>
>
>> Please note that the potential energy of the neighbor's coordinates in the
>> pmemd run is substantially higher than in the sander run. Some more rem.log
>> files are attached (sander on 16 CPUs, sander on 64 CPUs, pmemd on 32 CPUs,
>> pmemd on 64 CPUs). There is exactly the same issue under the GNU 4.3.2
>> compiler and a greater number of CPUs per replica basically makes the
>> calculation fail.
>>
>>
> This is obviously not good...
>
> I was also trying to investigate the situation on my own and modified the
>
>> code of both sander and pmemd to obtain a dump of some arrays - forces in
>> the case of sander (ftemp array in remd.F90, line 2447 produced by call
>> force(x,ix,ih,...)), most of the arguments of the pme_force subroutine
>> (frc_temp array in remd_exch.F90, line 523, call pme_force(atm_cnt,
>> crd_temp, frc_temp, ...)). I am attaching the forces calculated by replica
>> 1 the first time the above mentioned subroutines were executed - the
>> (neighbor) coordinates are that of replica 16. I think that the values in
>> the files should be the same, because the forces are from the very start of
>> the simulation, but they are not. It is notable that the file
>> pmemd_run_64_forces.001 (pmemd run on 64 CPUs) contains circa 75% of zero
>> components and 25% of non-zero components, which would correspond to the
>> part done by the master process of that replica. I have no knowledge of the
>> internals of the pme_force subroutine, therefore I am quite helpless at
>> this point. I am not sure if my findings about the forces are of any
>> relevance to the failure to calculate the potential energy of neighbor
>> coordinates.
>>
>>
> You nailed the reason why most forces are 0 on the master process.
> However, unless you synchronize the forces on each thread, I don't think
> they're comparable. Especially with PME, the same processor won't
> necessarily compute all components of the force for the same atoms -- some
> cores are responsible for the reciprocal space part of the PME force while
> other cores do the direct space sum. If you want an 'updated' force array,
> you need to allocate a separate NATOM*3-length array (let's say f_buf), and
> run the command:
>
> call MPI_Reduce(ftemp, f_buf, natom*3, MPI_DOUBLE_PRECISION, MPI_SUM, 0,
> pmemd_comm, err_code_mpi)
>
> Also, the only steps that can really be accurately compared are the first
> steps, or any steps where the coordinates are identical. Can you set
> ntwx=1, ntpr=1 and track where they start to diverge?
>
> I'll take a look at your files when I get a chance.
>
> Thanks for the report!
> Jason
>
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

text/x-diff attachment: sander.diff

text/x-diff attachment: pmemd.diff

application/octet-stream attachment: array_dumps.tar.bz2

Received on Sun Dec 16 2012 - 15:30:03 PST