Re: AMBER: pmemd - bug report from Petr Kulhanek on 2005-07-05 (Amber Archive Jul 2005)

From: Petr Kulhanek <kulhanek.chemi.muni.cz>
Date: Tue, 05 Jul 2005 18:29:54 +0200

Hello Bob,

Answers to your questions:

> In the meantime, as I said earlier, it would be helpful if you gave
me more info on how your prmtop was made.

solsol.top parm file is not constructed by xleap. It is reconstructed
from original topology used in MD simulation. For this purpose, I use
program that I wrote myself. Program is capable to cut any part from
topology and save this part as new topology. In solsol.top, only solute
and waters were kept (ie, ions were stripped away).
parm99 + glycam04 were used to construct original topology. 10-12
interaction terms are from glycam part. Only one problem that could be
connected with topology is that SOLVENT_POINTERS and ATOMS_PER_MOLECULE
sections are not built correctly (they are simply empty or contain
zeros). After testing, I found that this is not problem for minimization
runs either in sander or pmemd.

> I suspect the 10-12 vdw code may have problems in pmemd, as it is
pretty much untested (there are no standard amber test cases for this,
as far as I know).

10-12 vdw terms calculated by pmemd and sander are exactly the same.

> 1) You are doing a minimization with ntx = 5; ie., your input coords
are a molecular dynamics restart file with ntx set to 5. This specifies
input of both coordinates and velocities. Now, use of ntx = 5 with
minimization is listed in the documentation as illegal, though I think
the code (pmemd, and probably sander too) is deficient in not having a
check for attempted use of ntx other than 1 or 2 in combination with
minimization. Okay, this is not a cause of horrible grief in sander.
In pmemd, it is a cause of horrible grief because the velocity array is
not allocated when you are doing minimizations, but the code will
attempt to store velocities if ntx = 5. SO I should include a check, so
that the error is a little less obscure, but that is the source of your
segmentation violation.
>My earlier assertion about the seg violation for test case 1 being due
to an unallocated velocity array in minimization is incorrect. This was
probably true for earlier pmemd code, and then I changed it to prevent
just this sort of thing from occurring.

I think that ntx = 5 is correct in my case, because I need box size from
restart file and not from topology. In my opinion, all ntx options
should be valid for both minimization and dynamics.

> I have not yet thoroughly looked at everything yet, but it looks to
me like the whole problem hinges on whether a fractional coordinate
calculation can produce a result outside the range of 0.0 to 0.999...
(ie., if this assertion is correct, bad things will follow elsewhere; no
need to even check).

Actually, there are two problems. First one is with fractional
coordinates. The second problem is that there are two different
approaches to calculate subcell indexes.

> Now, the basic fractional coordinate calculation is of the form:
> fract = f1 - dnint(f1) + 0.5d0

> Have you actually trapped a value outside the range, or is this a
postulate to explain results?

YES, I trapped it. (see below)

> According to the definition of dnint() (f95 reference standard)
dnint(a) has the value of dint(a + 0.5) (actually the ref just defines
anint/aint, but dnint/dint are just the specific double precision
intrinsics), so for instance dnint(1.5) ALWAYS should round up 2.0;
given that 0.5 is precisely representable internally in binary, there
should be no rounding errors that can produce a 1.0 result from this
calculation (for negative values it is the same story, but dnint(a) has
the value dint(a - 0.5). I have not thought about this in a while, but
remember looking it over fairly closely when the code was first done;
the basic calculation and assumptions also exist in sander; I think this
stuff all derives from Tom Darden's original pbc code, but maybe I am
wrong about that. I'll do some checks with ifort and your samples in
the near future, but unless a fractional is producing 1.0, this is not
the problem.

PROBLEM ONE - fractional coordinates (ew_direct_cit.f90 -> subroutine
get_fract_crds)

When I debug program I just found that some fractional coordinate is
1.0. I thought that round errors are responsible for this. OK, I
analysed the problem in more details and I think that the calculation of
fractions are not fully correct.

I will assume only one-dimensional case in following consideration.

* fractional coordinate has to be in range from 0.00 to 0.9999..

* dnint(0.5) is 1.0 ( dnint(0.5) = dint(0.5 + 0.5) = dint(1.0) = 1.0 )
* dnint(0.0) is 0.0 ( dnint(0.0) = dint(0.0 - 0.5) = dint(-0.5) = 0.0 )
* dnint(-0.5) is -1.0 ( dint(-0.5) = dint(-0.5 - 0.5) = dint(-1.0) = -1.0 )
(I checked result for dnint(-0.5) also by g77)

* fractional coordinate is calculated by fract = f1 - dnint(f1) + 0.5d0

* if f1 = 0.5
fract = f1 - dnint(f1) + 0.5d0 = 0.5 - dnint(0.5) + 0.5 = 0.5 - 1.0 +
0.5 = 0.0
this is correct

* if f1 = 0.0
fract = f1 - dnint(f1) + 0.5d0 = 0.0 - dnint(0.0) + 0.5 = 0.0 - 0.0 +
0.5 = 0.5
this is correct

* if f1 = -0.5
fract = f1 - dnint(f1) + 0.5d0 = -0.5 - dnint(-0.5) + 0.5 = -0.5 -
(-1.0) + 0.5 = 1.0
this is NOT correct, fract has to be 0.0

It is my guess that fract could be calculated correctly only with
floor() or ceiling() functions and the problem could be still
complicated by round errors.

Now promissed backtrace for tst1:

segmentation fault occurs in get_nb_list subroutine

(idb) run -O -i lrt_pmemd.in -p solsol.top -c solsol00215.crd
Thread received signal SEGV
stopped at [subroutine get_nb_list(integer*4, type cit_tbl_rec (0:0),
type cit_img_rec (0:0), integer*4 (0:0), integer*4 (0:0), type
maskdata_rec (0:0), integer*4 (0:0), real*8 (0:0,0:0), type maskdata_rec
(0:0), integer*4 (0:0), integer*4 (0:0), logical*4, integer*4):231
0x0806f6d2]
     231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))

Actually, the error occures one line before:

     230 do i = 1, atm_maskdata(atm_i)%nummask
> 231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))

The reason is that atm_i is zero and atm_maskdata array is not accessed
correctly (this array starts from index one).

atm_i is initialized in the same subroutine from img_atm_map array

     212 atm_i = img_atm_map(img_i)

This array is initialized in setup_cit subroutine.

[1] stopped at [subroutine `EW_DIRECT_CIT_MODULE`setup_cit(integer*4,
real*8 (:,:), type cit_tbl_rec (:,:,:), type cit_img_rec (:), integer*4
(:), integer*4 (:)):437 0x08062ea0]
     437 return
(idb) print img_id
45820

only 45820 atoms are processed at the end of setup_cit subroutine but
the system contain 45821 atoms.
img_atm_map is initialized from atom lists for each subcells (atm_lst
array).

This array represents linked lists of atoms in each subcell. It is
constructed in setup_crd_idx_lst_tbl subroutine.

[3] stopped at [subroutine
`EW_DIRECT_CIT_MODULE`setup_crd_idx_lst_tbl(integer*4, real*8 (:,:),
integer*4 (:,:,:), type atm_lst_rec (:)):253 0x0806236b]
     253 nxt_atm_lst_idx = 1
(idb) list
     254
     255 scale_fac_x = dble(cit_tbl_x_dim)
     256 scale_fac_y = dble(cit_tbl_y_dim)
     257 scale_fac_z = dble(cit_tbl_z_dim)
     258
     259 crd_idx_lst_tbl(:,:,:) = 0 ! Marks empty entries
     260
     261 ! Load the atom id's:
     262
     263 do atm_id = 1, atm_cnt
     264
     265 x_idx = int(fraction(1, atm_id) * scale_fac_x)
     266 y_idx = int(fraction(2, atm_id) * scale_fac_y)
     267 z_idx = int(fraction(3, atm_id) * scale_fac_z)
     268
     269 if (crd_idx_lst_tbl(x_idx, y_idx, z_idx) .eq. 0) then !
New list.
     270
     271 crd_idx_lst_tbl(x_idx, y_idx, z_idx) = nxt_atm_lst_idx
     272
     273 else ! List already started. Follow the chain and
add a node:
     274
(idb) print cit_tbl_z_dim
15
(idb) print fraction(3, 35125)
1

(idb) print fraction(3, 35125) * scale_fac_z
15.0

crd_idx_lst_tbl array is indexed from 0 to cit_tbl_z_dim - 1, e.g. z_idx
index for atom 35125 is out of legal range. As the result, linked list
is not constructed correctly. In this particular case, one item is not
accesible when list is used in setup_cit. It is seen that z fractional
coordinate of 35125 atom is 1.0.

Fractional coordinates are calculated in get_fract_crds.

Real coordinates of atom 35125 are:
-3.2440000 50.6819992 -38.5530014
Box dimensions are:
78.3040009 75.9800034 77.1060028

so z coordinate is exatly -0.5 * box_z, this leads to f3 = -0.5

     637 if (is_orthog .ne. 0) then
     638
     639 recip_11 = recip(1, 1)
     640 recip_22 = recip(2, 2)
     641 recip_33 = recip(3, 3)
     642
     643 do i = 1, atm_cnt
     644 f1 = recip_11 * crd(1, i)
     645 f2 = recip_22 * crd(2, i)
     646 f3 = recip_33 * crd(3, i)
     647 fraction(1, i) = f1 - dnint(f1) + 0.5d0
     648 fraction(2, i) = f2 - dnint(f2) + 0.5d0
     649 fraction(3, i) = f3 - dnint(f3) + 0.5d0

and then fraction(3, 35125) is 1.0

NOTES: line numbers are from _*_.f90 files, it is necessary to unlimit
stack size (e.g. for bash: ulimit -s unlimited)

BACKTRACE for tst2 ( PROBLEM TWO )

In this case, infinity electrostatic energy is obtained. The reason is
that NB list is not constructed correctly. It contains NB pair between
the same atom, distance is than zero and energy goes to infinity.
The problem is located in get_nb_list subroutine. Index to z_bkts array
overflow legal range for some atom. This is not problem of fractional
coordinates but round errors when z_bkts_hi index is calculated. I can
prepare the same detailed backtrace as in previous case if it would be
helpful.

I think that the problems are localized. First problem is with
fractional coordinates and the second one is in round errors that could
lead to incorrect subcell indexes in get_nb_list subroutine.

Best regards,
        Petr

-- 
##################################################
   Petr Kulhanek
  ------------------------------------------------
   kulhanek.chemi.muni.cz
  ------------------------------------------------
   National Centre for Biomolecular Research
   Masaryk University Brno
   Kotlarska 2, CZ-611 37 Brno
   Czech Republic
##################################################
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu

Received on Tue Jul 05 2005 - 17:53:00 PDT