Re: AMBER: pmemd - bug report from Robert Duke on 2005-07-05 (Amber Archive Jul 2005)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 5 Jul 2005 12:47:19 -0400

Petr -
Thanks for the more complete analysis on your end. At the moment I am
unfortunately tending to agree with you; I am not sure now where the idea of
fractionals bounded in the range 0.0 to < 1.0 came from, but it really looks
to me like for negative values with a fractional part 0.5, you will indeed
get 1.0 for the fractional, and this could put you off the end of some data
structures. I am a bit dumbfounded as to why this has not wreaked havoc; I
will thoroughly study the problem and issue a fix. I think if there are
problems in sander, they may be less severe, but the sander guys should
check. The basic "fractional coordinate" algorithm is based back in sander
6 code. As far as whether a fractional is 0.0 or 1.0, the real effect on a
simulation should be nil, unless you happen to be using data structures that
depend on < 1.0. Thanks very much for finding this; I am still puzzled as
to why it has not been previously observed and will try to work that out.
Regards - Bob

----- Original Message -----
From: "Petr Kulhanek" <kulhanek.chemi.muni.cz>
To: <amber.scripps.edu>
Sent: Tuesday, July 05, 2005 12:29 PM
Subject: Re: AMBER: pmemd - bug report

> Hello Bob,
>
> Answers to your questions:
>
> > In the meantime, as I said earlier, it would be helpful if you gave
> me more info on how your prmtop was made.
>
> solsol.top parm file is not constructed by xleap. It is reconstructed from
> original topology used in MD simulation. For this purpose, I use program
> that I wrote myself. Program is capable to cut any part from topology and
> save this part as new topology. In solsol.top, only solute and waters were
> kept (ie, ions were stripped away).
> parm99 + glycam04 were used to construct original topology. 10-12
> interaction terms are from glycam part. Only one problem that could be
> connected with topology is that SOLVENT_POINTERS and ATOMS_PER_MOLECULE
> sections are not built correctly (they are simply empty or contain zeros).
> After testing, I found that this is not problem for minimization runs
> either in sander or pmemd.
>
> > I suspect the 10-12 vdw code may have problems in pmemd, as it is
> pretty much untested (there are no standard amber test cases for this, as
> far as I know).
>
> 10-12 vdw terms calculated by pmemd and sander are exactly the same.
>
> > 1) You are doing a minimization with ntx = 5; ie., your input coords
> are a molecular dynamics restart file with ntx set to 5. This specifies
> input of both coordinates and velocities. Now, use of ntx = 5 with
> minimization is listed in the documentation as illegal, though I think the
> code (pmemd, and probably sander too) is deficient in not having a check
> for attempted use of ntx other than 1 or 2 in combination with
> minimization. Okay, this is not a cause of horrible grief in sander. In
> pmemd, it is a cause of horrible grief because the velocity array is not
> allocated when you are doing minimizations, but the code will attempt to
> store velocities if ntx = 5. SO I should include a check, so that the
> error is a little less obscure, but that is the source of your
> segmentation violation.
> >My earlier assertion about the seg violation for test case 1 being due
> to an unallocated velocity array in minimization is incorrect. This was
> probably true for earlier pmemd code, and then I changed it to prevent
> just this sort of thing from occurring.
>
> I think that ntx = 5 is correct in my case, because I need box size from
> restart file and not from topology. In my opinion, all ntx options should
> be valid for both minimization and dynamics.
>
> > I have not yet thoroughly looked at everything yet, but it looks to
> me like the whole problem hinges on whether a fractional coordinate
> calculation can produce a result outside the range of 0.0 to 0.999...
> (ie., if this assertion is correct, bad things will follow elsewhere; no
> need to even check).
>
> Actually, there are two problems. First one is with fractional
> coordinates. The second problem is that there are two different approaches
> to calculate subcell indexes.
>
> > Now, the basic fractional coordinate calculation is of the form:
> > fract = f1 - dnint(f1) + 0.5d0
>
> > Have you actually trapped a value outside the range, or is this a
> postulate to explain results?
>
> YES, I trapped it. (see below)
>
> > According to the definition of dnint() (f95 reference standard)
> dnint(a) has the value of dint(a + 0.5) (actually the ref just defines
> anint/aint, but dnint/dint are just the specific double precision
> intrinsics), so for instance dnint(1.5) ALWAYS should round up 2.0; given
> that 0.5 is precisely representable internally in binary, there should be
> no rounding errors that can produce a 1.0 result from this calculation
> (for negative values it is the same story, but dnint(a) has the value
> dint(a - 0.5). I have not thought about this in a while, but remember
> looking it over fairly closely when the code was first done; the basic
> calculation and assumptions also exist in sander; I think this stuff all
> derives from Tom Darden's original pbc code, but maybe I am wrong about
> that. I'll do some checks with ifort and your samples in the near future,
> but unless a fractional is producing 1.0, this is not the problem.
>
> PROBLEM ONE - fractional coordinates (ew_direct_cit.f90 -> subroutine
> get_fract_crds)
>
> When I debug program I just found that some fractional coordinate is 1.0.
> I thought that round errors are responsible for this. OK, I analysed the
> problem in more details and I think that the calculation of fractions are
> not fully correct.
>
> I will assume only one-dimensional case in following consideration.
>
> * fractional coordinate has to be in range from 0.00 to 0.9999..
>
> * dnint(0.5) is 1.0 ( dnint(0.5) = dint(0.5 + 0.5) = dint(1.0) = 1.0 )
> * dnint(0.0) is 0.0 ( dnint(0.0) = dint(0.0 - 0.5) = dint(-0.5) = 0.0 )
> * dnint(-0.5) is -1.0 ( dint(-0.5) = dint(-0.5 - 0.5) = dint(-1.0)
> = -1.0 )
> (I checked result for dnint(-0.5) also by g77)
>
> * fractional coordinate is calculated by fract = f1 - dnint(f1) + 0.5d0
>
> * if f1 = 0.5
> fract = f1 - dnint(f1) + 0.5d0 = 0.5 - dnint(0.5) + 0.5 = 0.5 - 1.0 + 0.5
> = 0.0
> this is correct
>
> * if f1 = 0.0
> fract = f1 - dnint(f1) + 0.5d0 = 0.0 - dnint(0.0) + 0.5 = 0.0 - 0.0 + 0.5
> = 0.5
> this is correct
>
> * if f1 = -0.5
> fract = f1 - dnint(f1) + 0.5d0 = -0.5 - dnint(-0.5) + 0.5 = -0.5 - (-1.0)
> + 0.5 = 1.0
> this is NOT correct, fract has to be 0.0
>
> It is my guess that fract could be calculated correctly only with floor()
> or ceiling() functions and the problem could be still complicated by round
> errors.
>
> Now promissed backtrace for tst1:
>
> segmentation fault occurs in get_nb_list subroutine
>
> (idb) run -O -i lrt_pmemd.in -p solsol.top -c solsol00215.crd
> Thread received signal SEGV
> stopped at [subroutine get_nb_list(integer*4, type cit_tbl_rec (0:0), type
> cit_img_rec (0:0), integer*4 (0:0), integer*4 (0:0), type maskdata_rec
> (0:0), integer*4 (0:0), real*8 (0:0,0:0), type maskdata_rec (0:0),
> integer*4 (0:0), integer*4 (0:0), logical*4, integer*4):231 0x0806f6d2]
> 231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))
>
> Actually, the error occures one line before:
>
> 230 do i = 1, atm_maskdata(atm_i)%nummask
> > 231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))
>
> The reason is that atm_i is zero and atm_maskdata array is not accessed
> correctly (this array starts from index one).
>
> atm_i is initialized in the same subroutine from img_atm_map array
>
> 212 atm_i = img_atm_map(img_i)
>
> This array is initialized in setup_cit subroutine.
>
> [1] stopped at [subroutine `EW_DIRECT_CIT_MODULE`setup_cit(integer*4,
> real*8 (:,:), type cit_tbl_rec (:,:,:), type cit_img_rec (:), integer*4
> (:), integer*4 (:)):437 0x08062ea0]
> 437 return
> (idb) print img_id
> 45820
>
> only 45820 atoms are processed at the end of setup_cit subroutine but the
> system contain 45821 atoms.
> img_atm_map is initialized from atom lists for each subcells (atm_lst
> array).
>
> This array represents linked lists of atoms in each subcell. It is
> constructed in setup_crd_idx_lst_tbl subroutine.
>
> [3] stopped at [subroutine
> `EW_DIRECT_CIT_MODULE`setup_crd_idx_lst_tbl(integer*4, real*8 (:,:),
> integer*4 (:,:,:), type atm_lst_rec (:)):253 0x0806236b]
> 253 nxt_atm_lst_idx = 1
> (idb) list
> 254
> 255 scale_fac_x = dble(cit_tbl_x_dim)
> 256 scale_fac_y = dble(cit_tbl_y_dim)
> 257 scale_fac_z = dble(cit_tbl_z_dim)
> 258
> 259 crd_idx_lst_tbl(:,:,:) = 0 ! Marks empty entries
> 260
> 261 ! Load the atom id's:
> 262
> 263 do atm_id = 1, atm_cnt
> 264
> 265 x_idx = int(fraction(1, atm_id) * scale_fac_x)
> 266 y_idx = int(fraction(2, atm_id) * scale_fac_y)
> 267 z_idx = int(fraction(3, atm_id) * scale_fac_z)
> 268
> 269 if (crd_idx_lst_tbl(x_idx, y_idx, z_idx) .eq. 0) then ! New
> list.
> 270
> 271 crd_idx_lst_tbl(x_idx, y_idx, z_idx) = nxt_atm_lst_idx
> 272
> 273 else ! List already started. Follow the chain and add
> a node:
> 274
> (idb) print cit_tbl_z_dim
> 15
> (idb) print fraction(3, 35125)
> 1
>
> (idb) print fraction(3, 35125) * scale_fac_z
> 15.0
>
> crd_idx_lst_tbl array is indexed from 0 to cit_tbl_z_dim - 1, e.g. z_idx
> index for atom 35125 is out of legal range. As the result, linked list is
> not constructed correctly. In this particular case, one item is not
> accesible when list is used in setup_cit. It is seen that z fractional
> coordinate of 35125 atom is 1.0.
>
> Fractional coordinates are calculated in get_fract_crds.
>
> Real coordinates of atom 35125 are:
> -3.2440000 50.6819992 -38.5530014
> Box dimensions are:
> 78.3040009 75.9800034 77.1060028
>
> so z coordinate is exatly -0.5 * box_z, this leads to f3 = -0.5
>
> 637 if (is_orthog .ne. 0) then
> 638
> 639 recip_11 = recip(1, 1)
> 640 recip_22 = recip(2, 2)
> 641 recip_33 = recip(3, 3)
> 642
> 643 do i = 1, atm_cnt
> 644 f1 = recip_11 * crd(1, i)
> 645 f2 = recip_22 * crd(2, i)
> 646 f3 = recip_33 * crd(3, i)
> 647 fraction(1, i) = f1 - dnint(f1) + 0.5d0
> 648 fraction(2, i) = f2 - dnint(f2) + 0.5d0
> 649 fraction(3, i) = f3 - dnint(f3) + 0.5d0
>
> and then fraction(3, 35125) is 1.0
>
> NOTES: line numbers are from _*_.f90 files, it is necessary to unlimit
> stack size (e.g. for bash: ulimit -s unlimited)
>
> BACKTRACE for tst2 ( PROBLEM TWO )
>
> In this case, infinity electrostatic energy is obtained. The reason is
> that NB list is not constructed correctly. It contains NB pair between the
> same atom, distance is than zero and energy goes to infinity.
> The problem is located in get_nb_list subroutine. Index to z_bkts array
> overflow legal range for some atom. This is not problem of fractional
> coordinates but round errors when z_bkts_hi index is calculated. I can
> prepare the same detailed backtrace as in previous case if it would be
> helpful.
>
> I think that the problems are localized. First problem is with fractional
> coordinates and the second one is in round errors that could lead to
> incorrect subcell indexes in get_nb_list subroutine.
>
> Best regards,
> Petr
>
> --
> ##################################################
> Petr Kulhanek
> ------------------------------------------------
> kulhanek.chemi.muni.cz
> ------------------------------------------------
> National Centre for Biomolecular Research
> Masaryk University Brno
> Kotlarska 2, CZ-611 37 Brno
> Czech Republic
> ##################################################
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Tue Jul 05 2005 - 17:53:00 PDT