- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Petr Kulhanek <kulhanek.chemi.muni.cz>

Date: Tue, 05 Jul 2005 18:29:54 +0200

Hello Bob,

Answers to your questions:

* > In the meantime, as I said earlier, it would be helpful if you gave
*

me more info on how your prmtop was made.

solsol.top parm file is not constructed by xleap. It is reconstructed

from original topology used in MD simulation. For this purpose, I use

program that I wrote myself. Program is capable to cut any part from

topology and save this part as new topology. In solsol.top, only solute

and waters were kept (ie, ions were stripped away).

parm99 + glycam04 were used to construct original topology. 10-12

interaction terms are from glycam part. Only one problem that could be

connected with topology is that SOLVENT_POINTERS and ATOMS_PER_MOLECULE

sections are not built correctly (they are simply empty or contain

zeros). After testing, I found that this is not problem for minimization

runs either in sander or pmemd.

* > I suspect the 10-12 vdw code may have problems in pmemd, as it is
*

pretty much untested (there are no standard amber test cases for this,

as far as I know).

10-12 vdw terms calculated by pmemd and sander are exactly the same.

* > 1) You are doing a minimization with ntx = 5; ie., your input coords
*

are a molecular dynamics restart file with ntx set to 5. This specifies

input of both coordinates and velocities. Now, use of ntx = 5 with

minimization is listed in the documentation as illegal, though I think

the code (pmemd, and probably sander too) is deficient in not having a

check for attempted use of ntx other than 1 or 2 in combination with

minimization. Okay, this is not a cause of horrible grief in sander.

In pmemd, it is a cause of horrible grief because the velocity array is

not allocated when you are doing minimizations, but the code will

attempt to store velocities if ntx = 5. SO I should include a check, so

that the error is a little less obscure, but that is the source of your

segmentation violation.

* >My earlier assertion about the seg violation for test case 1 being due
*

to an unallocated velocity array in minimization is incorrect. This was

probably true for earlier pmemd code, and then I changed it to prevent

just this sort of thing from occurring.

I think that ntx = 5 is correct in my case, because I need box size from

restart file and not from topology. In my opinion, all ntx options

should be valid for both minimization and dynamics.

* > I have not yet thoroughly looked at everything yet, but it looks to
*

me like the whole problem hinges on whether a fractional coordinate

calculation can produce a result outside the range of 0.0 to 0.999...

(ie., if this assertion is correct, bad things will follow elsewhere; no

need to even check).

Actually, there are two problems. First one is with fractional

coordinates. The second problem is that there are two different

approaches to calculate subcell indexes.

* > Now, the basic fractional coordinate calculation is of the form:
*

* > fract = f1 - dnint(f1) + 0.5d0
*

* > Have you actually trapped a value outside the range, or is this a
*

postulate to explain results?

YES, I trapped it. (see below)

* > According to the definition of dnint() (f95 reference standard)
*

dnint(a) has the value of dint(a + 0.5) (actually the ref just defines

anint/aint, but dnint/dint are just the specific double precision

intrinsics), so for instance dnint(1.5) ALWAYS should round up 2.0;

given that 0.5 is precisely representable internally in binary, there

should be no rounding errors that can produce a 1.0 result from this

calculation (for negative values it is the same story, but dnint(a) has

the value dint(a - 0.5). I have not thought about this in a while, but

remember looking it over fairly closely when the code was first done;

the basic calculation and assumptions also exist in sander; I think this

stuff all derives from Tom Darden's original pbc code, but maybe I am

wrong about that. I'll do some checks with ifort and your samples in

the near future, but unless a fractional is producing 1.0, this is not

the problem.

PROBLEM ONE - fractional coordinates (ew_direct_cit.f90 -> subroutine

get_fract_crds)

When I debug program I just found that some fractional coordinate is

1.0. I thought that round errors are responsible for this. OK, I

analysed the problem in more details and I think that the calculation of

fractions are not fully correct.

I will assume only one-dimensional case in following consideration.

* fractional coordinate has to be in range from 0.00 to 0.9999..

* dnint(0.5) is 1.0 ( dnint(0.5) = dint(0.5 + 0.5) = dint(1.0) = 1.0 )

* dnint(0.0) is 0.0 ( dnint(0.0) = dint(0.0 - 0.5) = dint(-0.5) = 0.0 )

* dnint(-0.5) is -1.0 ( dint(-0.5) = dint(-0.5 - 0.5) = dint(-1.0) = -1.0 )

(I checked result for dnint(-0.5) also by g77)

* fractional coordinate is calculated by fract = f1 - dnint(f1) + 0.5d0

* if f1 = 0.5

fract = f1 - dnint(f1) + 0.5d0 = 0.5 - dnint(0.5) + 0.5 = 0.5 - 1.0 +

0.5 = 0.0

this is correct

* if f1 = 0.0

fract = f1 - dnint(f1) + 0.5d0 = 0.0 - dnint(0.0) + 0.5 = 0.0 - 0.0 +

0.5 = 0.5

this is correct

* if f1 = -0.5

fract = f1 - dnint(f1) + 0.5d0 = -0.5 - dnint(-0.5) + 0.5 = -0.5 -

(-1.0) + 0.5 = 1.0

this is NOT correct, fract has to be 0.0

It is my guess that fract could be calculated correctly only with

floor() or ceiling() functions and the problem could be still

complicated by round errors.

Now promissed backtrace for tst1:

segmentation fault occurs in get_nb_list subroutine

(idb) run -O -i lrt_pmemd.in -p solsol.top -c solsol00215.crd

Thread received signal SEGV

stopped at [subroutine get_nb_list(integer*4, type cit_tbl_rec (0:0),

type cit_img_rec (0:0), integer*4 (0:0), integer*4 (0:0), type

maskdata_rec (0:0), integer*4 (0:0), real*8 (0:0,0:0), type maskdata_rec

(0:0), integer*4 (0:0), integer*4 (0:0), logical*4, integer*4):231

0x0806f6d2]

231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))

Actually, the error occures one line before:

230 do i = 1, atm_maskdata(atm_i)%nummask

* > 231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))
*

The reason is that atm_i is zero and atm_maskdata array is not accessed

correctly (this array starts from index one).

atm_i is initialized in the same subroutine from img_atm_map array

212 atm_i = img_atm_map(img_i)

This array is initialized in setup_cit subroutine.

[1] stopped at [subroutine `EW_DIRECT_CIT_MODULE`setup_cit(integer*4,

real*8 (:,:), type cit_tbl_rec (:,:,:), type cit_img_rec (:), integer*4

(:), integer*4 (:)):437 0x08062ea0]

437 return

(idb) print img_id

45820

only 45820 atoms are processed at the end of setup_cit subroutine but

the system contain 45821 atoms.

img_atm_map is initialized from atom lists for each subcells (atm_lst

array).

This array represents linked lists of atoms in each subcell. It is

constructed in setup_crd_idx_lst_tbl subroutine.

[3] stopped at [subroutine

`EW_DIRECT_CIT_MODULE`setup_crd_idx_lst_tbl(integer*4, real*8 (:,:),

integer*4 (:,:,:), type atm_lst_rec (:)):253 0x0806236b]

253 nxt_atm_lst_idx = 1

(idb) list

254

255 scale_fac_x = dble(cit_tbl_x_dim)

256 scale_fac_y = dble(cit_tbl_y_dim)

257 scale_fac_z = dble(cit_tbl_z_dim)

258

259 crd_idx_lst_tbl(:,:,:) = 0 ! Marks empty entries

260

261 ! Load the atom id's:

262

263 do atm_id = 1, atm_cnt

264

265 x_idx = int(fraction(1, atm_id) * scale_fac_x)

266 y_idx = int(fraction(2, atm_id) * scale_fac_y)

267 z_idx = int(fraction(3, atm_id) * scale_fac_z)

268

269 if (crd_idx_lst_tbl(x_idx, y_idx, z_idx) .eq. 0) then !

New list.

270

271 crd_idx_lst_tbl(x_idx, y_idx, z_idx) = nxt_atm_lst_idx

272

273 else ! List already started. Follow the chain and

add a node:

274

(idb) print cit_tbl_z_dim

15

(idb) print fraction(3, 35125)

1

(idb) print fraction(3, 35125) * scale_fac_z

15.0

crd_idx_lst_tbl array is indexed from 0 to cit_tbl_z_dim - 1, e.g. z_idx

index for atom 35125 is out of legal range. As the result, linked list

is not constructed correctly. In this particular case, one item is not

accesible when list is used in setup_cit. It is seen that z fractional

coordinate of 35125 atom is 1.0.

Fractional coordinates are calculated in get_fract_crds.

Real coordinates of atom 35125 are:

-3.2440000 50.6819992 -38.5530014

Box dimensions are:

78.3040009 75.9800034 77.1060028

so z coordinate is exatly -0.5 * box_z, this leads to f3 = -0.5

637 if (is_orthog .ne. 0) then

638

639 recip_11 = recip(1, 1)

640 recip_22 = recip(2, 2)

641 recip_33 = recip(3, 3)

642

643 do i = 1, atm_cnt

644 f1 = recip_11 * crd(1, i)

645 f2 = recip_22 * crd(2, i)

646 f3 = recip_33 * crd(3, i)

647 fraction(1, i) = f1 - dnint(f1) + 0.5d0

648 fraction(2, i) = f2 - dnint(f2) + 0.5d0

649 fraction(3, i) = f3 - dnint(f3) + 0.5d0

and then fraction(3, 35125) is 1.0

NOTES: line numbers are from _*_.f90 files, it is necessary to unlimit

stack size (e.g. for bash: ulimit -s unlimited)

BACKTRACE for tst2 ( PROBLEM TWO )

In this case, infinity electrostatic energy is obtained. The reason is

that NB list is not constructed correctly. It contains NB pair between

the same atom, distance is than zero and energy goes to infinity.

The problem is located in get_nb_list subroutine. Index to z_bkts array

overflow legal range for some atom. This is not problem of fractional

coordinates but round errors when z_bkts_hi index is calculated. I can

prepare the same detailed backtrace as in previous case if it would be

helpful.

I think that the problems are localized. First problem is with

fractional coordinates and the second one is in round errors that could

lead to incorrect subcell indexes in get_nb_list subroutine.

Best regards,

Petr

Date: Tue, 05 Jul 2005 18:29:54 +0200

Hello Bob,

Answers to your questions:

me more info on how your prmtop was made.

solsol.top parm file is not constructed by xleap. It is reconstructed

from original topology used in MD simulation. For this purpose, I use

program that I wrote myself. Program is capable to cut any part from

topology and save this part as new topology. In solsol.top, only solute

and waters were kept (ie, ions were stripped away).

parm99 + glycam04 were used to construct original topology. 10-12

interaction terms are from glycam part. Only one problem that could be

connected with topology is that SOLVENT_POINTERS and ATOMS_PER_MOLECULE

sections are not built correctly (they are simply empty or contain

zeros). After testing, I found that this is not problem for minimization

runs either in sander or pmemd.

pretty much untested (there are no standard amber test cases for this,

as far as I know).

10-12 vdw terms calculated by pmemd and sander are exactly the same.

are a molecular dynamics restart file with ntx set to 5. This specifies

input of both coordinates and velocities. Now, use of ntx = 5 with

minimization is listed in the documentation as illegal, though I think

the code (pmemd, and probably sander too) is deficient in not having a

check for attempted use of ntx other than 1 or 2 in combination with

minimization. Okay, this is not a cause of horrible grief in sander.

In pmemd, it is a cause of horrible grief because the velocity array is

not allocated when you are doing minimizations, but the code will

attempt to store velocities if ntx = 5. SO I should include a check, so

that the error is a little less obscure, but that is the source of your

segmentation violation.

to an unallocated velocity array in minimization is incorrect. This was

probably true for earlier pmemd code, and then I changed it to prevent

just this sort of thing from occurring.

I think that ntx = 5 is correct in my case, because I need box size from

restart file and not from topology. In my opinion, all ntx options

should be valid for both minimization and dynamics.

me like the whole problem hinges on whether a fractional coordinate

calculation can produce a result outside the range of 0.0 to 0.999...

(ie., if this assertion is correct, bad things will follow elsewhere; no

need to even check).

Actually, there are two problems. First one is with fractional

coordinates. The second problem is that there are two different

approaches to calculate subcell indexes.

postulate to explain results?

YES, I trapped it. (see below)

dnint(a) has the value of dint(a + 0.5) (actually the ref just defines

anint/aint, but dnint/dint are just the specific double precision

intrinsics), so for instance dnint(1.5) ALWAYS should round up 2.0;

given that 0.5 is precisely representable internally in binary, there

should be no rounding errors that can produce a 1.0 result from this

calculation (for negative values it is the same story, but dnint(a) has

the value dint(a - 0.5). I have not thought about this in a while, but

remember looking it over fairly closely when the code was first done;

the basic calculation and assumptions also exist in sander; I think this

stuff all derives from Tom Darden's original pbc code, but maybe I am

wrong about that. I'll do some checks with ifort and your samples in

the near future, but unless a fractional is producing 1.0, this is not

the problem.

PROBLEM ONE - fractional coordinates (ew_direct_cit.f90 -> subroutine

get_fract_crds)

When I debug program I just found that some fractional coordinate is

1.0. I thought that round errors are responsible for this. OK, I

analysed the problem in more details and I think that the calculation of

fractions are not fully correct.

I will assume only one-dimensional case in following consideration.

* fractional coordinate has to be in range from 0.00 to 0.9999..

* dnint(0.5) is 1.0 ( dnint(0.5) = dint(0.5 + 0.5) = dint(1.0) = 1.0 )

* dnint(0.0) is 0.0 ( dnint(0.0) = dint(0.0 - 0.5) = dint(-0.5) = 0.0 )

* dnint(-0.5) is -1.0 ( dint(-0.5) = dint(-0.5 - 0.5) = dint(-1.0) = -1.0 )

(I checked result for dnint(-0.5) also by g77)

* fractional coordinate is calculated by fract = f1 - dnint(f1) + 0.5d0

* if f1 = 0.5

fract = f1 - dnint(f1) + 0.5d0 = 0.5 - dnint(0.5) + 0.5 = 0.5 - 1.0 +

0.5 = 0.0

this is correct

* if f1 = 0.0

fract = f1 - dnint(f1) + 0.5d0 = 0.0 - dnint(0.0) + 0.5 = 0.0 - 0.0 +

0.5 = 0.5

this is correct

* if f1 = -0.5

fract = f1 - dnint(f1) + 0.5d0 = -0.5 - dnint(-0.5) + 0.5 = -0.5 -

(-1.0) + 0.5 = 1.0

this is NOT correct, fract has to be 0.0

It is my guess that fract could be calculated correctly only with

floor() or ceiling() functions and the problem could be still

complicated by round errors.

Now promissed backtrace for tst1:

segmentation fault occurs in get_nb_list subroutine

(idb) run -O -i lrt_pmemd.in -p solsol.top -c solsol00215.crd

Thread received signal SEGV

stopped at [subroutine get_nb_list(integer*4, type cit_tbl_rec (0:0),

type cit_img_rec (0:0), integer*4 (0:0), integer*4 (0:0), type

maskdata_rec (0:0), integer*4 (0:0), real*8 (0:0,0:0), type maskdata_rec

(0:0), integer*4 (0:0), integer*4 (0:0), logical*4, integer*4):231

0x0806f6d2]

231 img_j = atm_img_map(atm_mask(atm_mask_idx + i))

Actually, the error occures one line before:

230 do i = 1, atm_maskdata(atm_i)%nummask

The reason is that atm_i is zero and atm_maskdata array is not accessed

correctly (this array starts from index one).

atm_i is initialized in the same subroutine from img_atm_map array

212 atm_i = img_atm_map(img_i)

This array is initialized in setup_cit subroutine.

[1] stopped at [subroutine `EW_DIRECT_CIT_MODULE`setup_cit(integer*4,

real*8 (:,:), type cit_tbl_rec (:,:,:), type cit_img_rec (:), integer*4

(:), integer*4 (:)):437 0x08062ea0]

437 return

(idb) print img_id

45820

only 45820 atoms are processed at the end of setup_cit subroutine but

the system contain 45821 atoms.

img_atm_map is initialized from atom lists for each subcells (atm_lst

array).

This array represents linked lists of atoms in each subcell. It is

constructed in setup_crd_idx_lst_tbl subroutine.

[3] stopped at [subroutine

`EW_DIRECT_CIT_MODULE`setup_crd_idx_lst_tbl(integer*4, real*8 (:,:),

integer*4 (:,:,:), type atm_lst_rec (:)):253 0x0806236b]

253 nxt_atm_lst_idx = 1

(idb) list

254

255 scale_fac_x = dble(cit_tbl_x_dim)

256 scale_fac_y = dble(cit_tbl_y_dim)

257 scale_fac_z = dble(cit_tbl_z_dim)

258

259 crd_idx_lst_tbl(:,:,:) = 0 ! Marks empty entries

260

261 ! Load the atom id's:

262

263 do atm_id = 1, atm_cnt

264

265 x_idx = int(fraction(1, atm_id) * scale_fac_x)

266 y_idx = int(fraction(2, atm_id) * scale_fac_y)

267 z_idx = int(fraction(3, atm_id) * scale_fac_z)

268

269 if (crd_idx_lst_tbl(x_idx, y_idx, z_idx) .eq. 0) then !

New list.

270

271 crd_idx_lst_tbl(x_idx, y_idx, z_idx) = nxt_atm_lst_idx

272

273 else ! List already started. Follow the chain and

add a node:

274

(idb) print cit_tbl_z_dim

15

(idb) print fraction(3, 35125)

1

(idb) print fraction(3, 35125) * scale_fac_z

15.0

crd_idx_lst_tbl array is indexed from 0 to cit_tbl_z_dim - 1, e.g. z_idx

index for atom 35125 is out of legal range. As the result, linked list

is not constructed correctly. In this particular case, one item is not

accesible when list is used in setup_cit. It is seen that z fractional

coordinate of 35125 atom is 1.0.

Fractional coordinates are calculated in get_fract_crds.

Real coordinates of atom 35125 are:

-3.2440000 50.6819992 -38.5530014

Box dimensions are:

78.3040009 75.9800034 77.1060028

so z coordinate is exatly -0.5 * box_z, this leads to f3 = -0.5

637 if (is_orthog .ne. 0) then

638

639 recip_11 = recip(1, 1)

640 recip_22 = recip(2, 2)

641 recip_33 = recip(3, 3)

642

643 do i = 1, atm_cnt

644 f1 = recip_11 * crd(1, i)

645 f2 = recip_22 * crd(2, i)

646 f3 = recip_33 * crd(3, i)

647 fraction(1, i) = f1 - dnint(f1) + 0.5d0

648 fraction(2, i) = f2 - dnint(f2) + 0.5d0

649 fraction(3, i) = f3 - dnint(f3) + 0.5d0

and then fraction(3, 35125) is 1.0

NOTES: line numbers are from _*_.f90 files, it is necessary to unlimit

stack size (e.g. for bash: ulimit -s unlimited)

BACKTRACE for tst2 ( PROBLEM TWO )

In this case, infinity electrostatic energy is obtained. The reason is

that NB list is not constructed correctly. It contains NB pair between

the same atom, distance is than zero and energy goes to infinity.

The problem is located in get_nb_list subroutine. Index to z_bkts array

overflow legal range for some atom. This is not problem of fractional

coordinates but round errors when z_bkts_hi index is calculated. I can

prepare the same detailed backtrace as in previous case if it would be

helpful.

I think that the problems are localized. First problem is with

fractional coordinates and the second one is in round errors that could

lead to incorrect subcell indexes in get_nb_list subroutine.

Best regards,

Petr

-- ################################################## Petr Kulhanek ------------------------------------------------ kulhanek.chemi.muni.cz ------------------------------------------------ National Centre for Biomolecular Research Masaryk University Brno Kotlarska 2, CZ-611 37 Brno Czech Republic ################################################## ----------------------------------------------------------------------- The AMBER Mail Reflector To post, send mail to amber.scripps.edu To unsubscribe, send "unsubscribe amber" to majordomo.scripps.eduReceived on Tue Jul 05 2005 - 17:53:00 PDT

Custom Search