Re: [AMBER] Unwrapping in MSD calculations with pytraj/cpptraj

From: Daniel Roe via AMBER <amber.ambermd.org>
Date: Tue, 9 Aug 2022 14:43:36 -0400

Hi,

Unfortunately I can't reproduce your problem. I tried diffusion on a
400 ns trajectory (orthogonal box) with 1699 water molecules before
unwrap and after unwrap with no imaging using the following input
(full output will be pasted at the end):

parm ../cb7_am1-bcc.parm7
trajin ../MD/md.000.nc
set mask = :WAT.O
diffusion out diff.1.dat time 0.5 diffout diff.1.out D1 $mask
unwrap
diffusion out diff.2.dat time 0.5 diffout diff.2.out D2 $mask noimage

The diffusion constants that are calculated are exactly the same:

$ cat diff.1.out diff.2.out
#Set D1[D] D1[Slope] D1[Intercept] D1[Corr] D1[Label]
       1 2.0469 1.2281 -5011.0599 0.9997 D1_AvgDr
       2 2.1459 0.4292 -2736.1236 0.9995 D1_AvgDx
       3 2.0444 0.4089 -2058.3774 0.9992 D1_AvgDy
       4 1.9502 0.3900 -216.5589 0.9996 D1_AvgDz
#Set D2[D] D2[Slope] D2[Intercept] D2[Corr] D2[Label]
       1 2.0469 1.2281 -5011.0599 0.9997 D2_AvgDr
       2 2.1459 0.4292 -2736.1236 0.9995 D2_AvgDx
       3 2.0444 0.4089 -2058.3774 0.9992 D2_AvgDy
       4 1.9502 0.3900 -216.5589 0.9996 D2_AvgDz

Is there any way you can provide me off-list with access to the
problematic topology/trajectory so I can try to reproduce what you are
seeing? Thanks,

-Dan

CPPTRAJ: Trajectory Analysis. V6.13.0 (GitHub)
    ___ ___ ___ ___
     | \/ | \/ | \/ |
    _|_/\_|_/\_|_/\_|_

| Date/time: 08/09/22 14:23:51
| Available memory: 38.139 GB

INPUT: Reading input from 'cpptraj.in'
  [parm ../cb7_am1-bcc.parm7]
Reading '../cb7_am1-bcc.parm7' as Amber Topology
Radius Set: modified Bondi radii (mbondi)
  [trajin ../MD/md.000.nc]
Reading '../MD/md.000.nc' as Amber NetCDF
  [set mask = :WAT.O]
Variable 'mask' set to ':WAT.O'
  [diffusion out diff.1.dat time 0.5 diffout diff.1.out D1 :WAT.O]
    DIFFUSION:
Atom Mask is [:WAT.O]
Only average diffusion will be calculated.
Data set base name: D1
Corrections for imaging enabled.
Output files:
  diff.1.dat: (x) Mean square displacement(s) in the X direction (in Ang.^2).
  diff.1.dat: (y) Mean square displacement(s) in the Y direction (in Ang.^2).
  diff.1.dat: (z) Mean square displacement(s) in the Z direction (in Ang.^2).
  diff.1.dat: (r) Overall mean square displacement(s) (in Ang.^2).
  diff.1.dat: (a) Total distance travelled (in Ang.).
The time between frames is 0.5 ps.
Calculating diffusion constants by fitting slope to MSD vs time
  and multiplying by 10.0/2*N (where N is # of dimensions), units
  are 1x10^-5 cm^2/s.
Diffusion constant output to 'diff.1.out'
  [unwrap]
    UNWRAP: By atom using all atoms
Reference is first frame.
  [diffusion out diff.2.dat time 0.5 diffout diff.2.out D2 :WAT.O noimage]
    DIFFUSION:
Atom Mask is [:WAT.O]
Only average diffusion will be calculated.
Data set base name: D2
Corrections for imaging disabled.
Output files:
  diff.2.dat: (x) Mean square displacement(s) in the X direction (in Ang.^2).
  diff.2.dat: (y) Mean square displacement(s) in the Y direction (in Ang.^2).
  diff.2.dat: (z) Mean square displacement(s) in the Z direction (in Ang.^2).
  diff.2.dat: (r) Overall mean square displacement(s) (in Ang.^2).
  diff.2.dat: (a) Total distance travelled (in Ang.).
The time between frames is 0.5 ps.
Calculating diffusion constants by fitting slope to MSD vs time
  and multiplying by 10.0/2*N (where N is # of dimensions), units
  are 1x10^-5 cm^2/s.
Diffusion constant output to 'diff.2.out'
---------- RUN BEGIN -------------------------------------------------

PARAMETER FILES (1 total):
 0: cb7_am1-bcc.parm7, 6922 atoms, 1700 res, box: Orthorhombic, 1700
mol, 1699 solvent

INPUT TRAJECTORIES (1 total):
 0: 'md.000.nc' is a NetCDF AMBER trajectory with coordinates, time,
box, Parm cb7_am1-bcc.parm7 (Orthorhombic box) (reading 800000 of
800000)
  Coordinate processing will occur on 800000 frames.

BEGIN TRAJECTORY PROCESSING:
.....................................................
ACTION SETUP FOR PARM 'cb7_am1-bcc.parm7' (3 actions):
  0: [diffusion out diff.1.dat time 0.5 diffout diff.1.out D1 :WAT.O]
Mask [:WAT.O] corresponds to 1699 atoms.
Imaging enabled.
  1: [unwrap]
Mask [*] corresponds to 6922 atoms.
Number of atoms to be unwrapped is 6922
  2: [diffusion out diff.2.dat time 0.5 diffout diff.2.out D2 :WAT.O noimage]
Mask [:WAT.O] corresponds to 1699 atoms.
Imaging disabled.
----- md.000.nc (1-800000, 1) -----
 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Complete.

Read 800000 frames and processed 800000 frames.
TIME: Avg. throughput= 1775.2305 frames / second.

ACTION OUTPUT:
    DIFFUSION: Calculating diffusion constants from slopes.
    DIFFUSION: Calculating diffusion constants from slopes.
TIME: Analyses took 0.0000 seconds.

DATASETS (21 total):
mask "mask" (string variable), size is 1 (0.014 kB)
D1[X] "D1[X]" (double), size is 800000 (6.400 MB)
D1[Y] "D1[Y]" (double), size is 800000 (6.400 MB)
D1[Z] "D1[Z]" (double), size is 800000 (6.400 MB)
D1[R] "D1[R]" (double), size is 800000 (6.400 MB)
D1[A] "D1[A]" (double), size is 800000 (6.400 MB)
D1[D] "D1[D]" (double), size is 4 (0.032 kB)
D1[Label] "D1[Label]" (string), size is 4 (0.032 kB)
D1[Slope] "D1[Slope]" (double), size is 4 (0.032 kB)
D1[Intercept] "D1[Intercept]" (double), size is 4 (0.032 kB)
D1[Corr] "D1[Corr]" (double), size is 4 (0.032 kB)
D2[X] "D2[X]" (double), size is 800000 (6.400 MB)
D2[Y] "D2[Y]" (double), size is 800000 (6.400 MB)
D2[Z] "D2[Z]" (double), size is 800000 (6.400 MB)
D2[R] "D2[R]" (double), size is 800000 (6.400 MB)
D2[A] "D2[A]" (double), size is 800000 (6.400 MB)
D2[D] "D2[D]" (double), size is 4 (0.032 kB)
D2[Label] "D2[Label]" (string), size is 4 (0.032 kB)
D2[Slope] "D2[Slope]" (double), size is 4 (0.032 kB)
D2[Intercept] "D2[Intercept]" (double), size is 4 (0.032 kB)
D2[Corr] "D2[Corr]" (double), size is 4 (0.032 kB)
    Total data set memory usage is at least 64.000 MB

DATAFILES (4 total):
  diff.1.out (Standard Data File): D1[D] D1[Slope] D1[Intercept]
D1[Corr] D1[Label]
  diff.1.dat (Standard Data File): D1[R] D1[X] D1[Y] D1[Z] D1[A]
  diff.2.out (Standard Data File): D2[D] D2[Slope] D2[Intercept]
D2[Corr] D2[Label]
  diff.2.dat (Standard Data File): D2[R] D2[X] D2[Y] D2[Z] D2[A]

RUN TIMING:
TIME: Init : 0.0001 s ( 0.00%)
TIME: Trajectory Process : 450.6457 s ( 98.14%)
TIME: Action Post : 0.1012 s ( 0.02%)
TIME: Analysis : 0.0000 s ( 0.00%)
TIME: Data File Write : 8.4603 s ( 1.84%)
TIME: Other : 0.0001 s ( 0.00%)
TIME: Run Total 459.2074 s
---------- RUN END ---------------------------------------------------
TIME: Total execution time: 459.2394 seconds.
--------------------------------------------------------------------------------
To cite CPPTRAJ use:
Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for
  Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem.
  Theory Comput., 2013, 9 (7), pp 3084-3095.

On Tue, Aug 9, 2022 at 12:08 PM Cianna Calia <ccalia.ucsd.edu> wrote:
>
> It's orthogonal.
>
> Thanks.
>
> On Tue, Aug 9, 2022 at 7:27 AM Daniel Roe via AMBER <amber.ambermd.org> wrote:
>>
>> Well dang - I'll take another look at the code. Is this an orthogonal
>> box or non-orthogonal?
>>
>> Thanks for the report.
>>
>> -Dan
>>
>> On Tue, Aug 9, 2022 at 6:59 AM Cianna Calia <ccalia.ucsd.edu> wrote:
>> >
>> > Update: I tried the diffusion function from cpptraj version 6.13.0 with a different test case, now with a longer trajectory (100 ns), and this time I see big differences between the data from before unwrapping vs. after unwrapping - bigger differences than I saw with the version I was using before (V4.25.6). See attached plot; this is comparing the "R" MSD datasets obtained using V6.13.0 with this input:
>> >
>> > parm ../../water_box.prmtop
>> > trajin ../../water_box_100ns_1.nc
>> > diffusion out one_waterO_diff_cpptraj_test_before_unwrappingbyatom.dat time 2.0 ":1.O" nocalc
>> > unwrap byatom
>> > diffusion out one_waterO_diff_cpptraj_test_after_unwrappingbyatom.dat time 2.0 noimage ":1.O" nocalc
>> >
>> > Maybe the updated diffusion function is introducing another issue?
>> >
>> > Thanks,
>> > Cianna
>> >
>> > On Sun, Jul 31, 2022 at 10:11 AM Cianna Calia <ccalia.ucsd.edu> wrote:
>> >>
>> >> I tried it with the same test case I'd been using before - now the MSD data comes out exactly the same before and after unwrapping.
>> >>
>> >> Thanks again,
>> >> Cianna
>> >>
>> >> On Sun, Jul 31, 2022 at 7:54 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I've fixed the behavior (I think) as of GitHub version 6.13.0
>> >>> (https://urldefense.com/v3/__https://github.com/Amber-MD/cpptraj/pull/984__;!!Mih3wA!H_XyP0NIHpChvZKV0aeMYpqJniRDBIwzKlnh1jiLbVQkUCfMFUV0bY2uCxtsvE7AU-OR9GqldO8949WK-Tr9$ ).
>> >>>
>> >>> Let me know if you get a chance to try it out.
>> >>>
>> >>> -Dan
>> >>>
>> >>> On Fri, Jul 29, 2022 at 7:57 PM Cianna Calia <ccalia.ucsd.edu> wrote:
>> >>> >
>> >>> > Interesting, thanks. Right, the diffusion constants are pretty close; I had just been looking at these numbers since it's specifically the MSD that I needed for something.
>> >>> >
>> >>> > Thanks so much for the explanation; that helps a lot. Enjoy your vacation.
>> >>> >
>> >>> > Cianna
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Fri, Jul 29, 2022 at 4:20 PM Daniel Roe via AMBER <amber.ambermd.org> wrote:
>> >>> >>
>> >>> >> Yeah, I'm pretty sure I know what's going on here. The TL;DR
>> >>> >> explanation is that the current way I correct for imaging in diffusion
>> >>> >> will gradually diverge from the way 'unwrap' corrects for imaging.
>> >>> >> Neither is wrong per se, they just have different strategies to
>> >>> >> correct for imaging. Arguably the 'unwrap' way is more "correct"
>> >>> >> because it's not using a summation under the hood. You showed the raw
>> >>> >> numbers but I'm guessing the diffusion constants calculated from those
>> >>> >> files are probably really close right?
>> >>> >>
>> >>> >> Anyway, the fix (to make 'diffusion' correct imaging the way 'unwrap'
>> >>> >> does) is simple but will require some testing on my part, and I'm
>> >>> >> about to head out on vacation. Once I'm back I'll get the fix in, but
>> >>> >> until then your best bet is probably to "unwrap" first and then
>> >>> >> calculate diffusion.
>> >>> >>
>> >>> >> Thanks for the report.
>> >>> >>
>> >>> >> -Dan
>> >>> >>
>> >>> >> On Fri, Jul 29, 2022 at 6:56 PM Cianna Calia <ccalia.ucsd.edu> wrote:
>> >>> >> >
>> >>> >> > I'm using an orthorhombic box (roughly cubic since I'd used "iso" with solvatebox in tleap).
>> >>> >> > cpptraj with AmberTools20
>> >>> >> > pytraj version 2.0.5
>> >>> >> >
>> >>> >> > When I compare cpptraj's MSD data before vs. after unwrapping with byatom, the differences are ones to tens of Å^2. The maximum differences in x, y, z, and r datasets are:
>> >>> >> >
>> >>> >> > max X err: 7.819200000000137
>> >>> >> > max Y err: 13.486300000000028
>> >>> >> > max Z err: 18.626699999999914
>> >>> >> > max R err: 24.975800000000163
>> >>> >> >
>> >>> >> > It's off by more than tenths or hundredths.
>> >>> >> >
>> >>> >> > Tailing the two cpptraj data files shows the difference - End of the datafile from before unwrapping:
>> >>> >> >
>> >>> >> > 1995 2774.8698 1317.9164 699.3513 757.6022 52.6770
>> >>> >> > 1996 2590.9952 1182.8005 617.1339 791.0609 50.9018
>> >>> >> > 1997 2672.6255 1248.8877 684.0725 739.6653 51.6974
>> >>> >> > 1998 2666.3472 1224.3471 674.2470 767.7531 51.6367
>> >>> >> > 1999 2700.3229 1240.0459 721.4465 738.8306 51.9646
>> >>> >> >
>> >>> >> > End of the datafile from after unwrapping:
>> >>> >> >
>> >>> >> > 1995 2785.2869 1312.0513 705.1366 768.0990 52.7758
>> >>> >> > 1996 2602.1418 1177.2445 622.5693 802.3280 51.0112
>> >>> >> > 1997 2684.3411 1243.1784 689.7944 751.3682 51.8106
>> >>> >> > 1998 2677.9833 1218.6942 679.9278 779.3613 51.7492
>> >>> >> > 1999 2714.8978 1234.3569 727.3224 753.2185 52.1047
>> >>> >> >
>> >>> >> > Thanks for giving this your attention,
>> >>> >> > Cianna
>> >>> >> >
>> >>> >> >
>> >>> >> > On Fri, Jul 29, 2022 at 12:51 PM Daniel Roe via AMBER <amber.ambermd.org> wrote:
>> >>> >> >>
>> >>> >> >> Just one more question - just how different are the results? Are they
>> >>> >> >> pretty close (within a few tenths/hundreths of an Ang.) or are they
>> >>> >> >> way off?
>> >>> >> >>
>> >>> >> >> -Dan
>> >>> >> >>
>> >>> >> >> On Fri, Jul 29, 2022 at 3:49 PM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>> >>> >> >> >
>> >>> >> >> > Hi,
>> >>> >> >> >
>> >>> >> >> > I think I see the behavior you are seeing. I need to look into it more
>> >>> >> >> > and I'll update you with what I find.
>> >>> >> >> >
>> >>> >> >> > What kind of unit cell are you using? Also, what versions of cpptraj
>> >>> >> >> > and pytraj are you using?
>> >>> >> >> >
>> >>> >> >> > Thanks!
>> >>> >> >> >
>> >>> >> >> > -Dan
>> >>> >> >> >
>> >>> >> >> > On Mon, Jul 25, 2022 at 8:58 AM Cianna Calia via AMBER
>> >>> >> >> > <amber.ambermd.org> wrote:
>> >>> >> >> > >
>> >>> >> >> > > Hello AMBER community!
>> >>> >> >> > >
>> >>> >> >> > > I am wondering if anyone can help me understand how the diffusion functions
>> >>> >> >> > > in pytraj/cpptraj handle unwrapping. When I try using pytraj to calculate
>> >>> >> >> > > an atom's MSD from a trajectory unwrapped in cpptraj (with "unwrap
>> >>> >> >> > > byatom"), I am unsure why it deviates from pytraj's result for the original
>> >>> >> >> > > wrapped trajectory, since I thought pytraj would deal with unwrapping
>> >>> >> >> > > automatically in a way that would give the same result as using the already
>> >>> >> >> > > unwrapped trajectory. But the X, Y, and Z datasets differ starting where
>> >>> >> >> > > unwrapping becomes relevant for each dimension.
>> >>> >> >> > >
>> >>> >> >> > > More detail - I have the original wrapped trajectory
>> >>> >> >> > > 1hg7_WT_cubic_100ns_1_last2ns.nc and an unwrapped trajectory
>> >>> >> >> > > unwrap_2ns_test1.nc made in cpptraj like this:
>> >>> >> >> > >
>> >>> >> >> > > parm 1hg7_WT_cubic.prmtop
>> >>> >> >> > > trajin 1hg7_WT_cubic_100ns_1_last2ns.nc
>> >>> >> >> > > unwrap byatom
>> >>> >> >> > > trajout unwrap_2ns_test1.nc
>> >>> >> >> > >
>> >>> >> >> > > I tried the following diffusion calculations with pytraj:
>> >>> >> >> > >
>> >>> >> >> > > import pytraj as pt
>> >>> >> >> > >
>> >>> >> >> > > original_traj = pt.load('1hg7_WT_cubic_100ns_1_last2ns.nc',
>> >>> >> >> > > '1hg7_WT_cubic.prmtop')
>> >>> >> >> > > pt_diffusion_orig_traj = pt.diffusion(original_traj, '.1730', dtype='dict',
>> >>> >> >> > > frame_indices=[i for i in range(len(original_traj))])
>> >>> >> >> > >
>> >>> >> >> > > unwrapped_traj = pt.load('unwrap_2ns_test1.nc', '1hg7_WT_cubic.prmtop')
>> >>> >> >> > > pt_diffusion_unw_traj = pt.diffusion(unwrapped_traj, '.1730', dtype='dict',
>> >>> >> >> > > frame_indices=[i for i in range(len(unwrapped_traj))])
>> >>> >> >> > >
>> >>> >> >> > > I am hoping for clarification on why pt_diffusion_orig_traj and
>> >>> >> >> > > pt_diffusion_unw_traj are not the same.
>> >>> >> >> > >
>> >>> >> >> > > I observe the same situation when I do analogous calculations with cpptraj
>> >>> >> >> > > as opposed to pytraj. I also tried seeing how the noimage option with the
>> >>> >> >> > > cpptraj diffusion function affects the results, and I am similarly confused
>> >>> >> >> > > about why the two diffusion calculations below give different results
>> >>> >> >> > > (again, differing starting from where there's unwrapping):
>> >>> >> >> > >
>> >>> >> >> > > parm 1hg7_WT_cubic.prmtop
>> >>> >> >> > > trajin 1hg7_WT_cubic_100ns_1_last2ns.nc
>> >>> >> >> > > diffusion out orig_traj_1730_diff_cpptraj.dat time 1.0 "@1730" nocalc
>> >>> >> >> > > unwrap byatom
>> >>> >> >> > > diffusion out unwrapped_traj_1730_diff_cpptraj.dat time 1.0 noimage "@1730"
>> >>> >> >> > > nocalc
>> >>> >> >> > >
>> >>> >> >> > > Taking out the "byatom" option makes no noticeable difference, nor does
>> >>> >> >> > > removing the "noimage" option from the diffusion command acting on the
>> >>> >> >> > > already unwrapped trajectory. The difference between the diffusion data
>> >>> >> >> > > before and after unwrapping here is not the same as the difference I
>> >>> >> >> > > observe when I use the diffusion command on the original wrapped trajectory
>> >>> >> >> > > with vs. without the "noimage" option.
>> >>> >> >> > >
>> >>> >> >> > > Thanks in advance for any help with understanding this!
>> >>> >> >> > > Cianna Calia
>> >>> >> >> > > _______________________________________________
>> >>> >> >> > > AMBER mailing list
>> >>> >> >> > > AMBER.ambermd.org
>> >>> >> >> > > https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!A3PyHn9pZqfoG08rMdV6_T2d61zGEtAPiOSJA7y1FkFtqo18BxS-4b4O4gvuDJdUuUQZn-r_1fI32A$
>> >>> >> >>
>> >>> >> >> _______________________________________________
>> >>> >> >> AMBER mailing list
>> >>> >> >> AMBER.ambermd.org
>> >>> >> >> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!A3PyHn9pZqfoG08rMdV6_T2d61zGEtAPiOSJA7y1FkFtqo18BxS-4b4O4gvuDJdUuUQZn-r_1fI32A$
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> AMBER mailing list
>> >>> >> AMBER.ambermd.org
>> >>> >> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!CpPOy-HcJwxbvZgAUfP1-CShCHpDMRwrECMAb5eqw5v2kZoT6_22TXWtiS6fgXbCq_LXhlyI8E-cYw$
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!HieMSFYnn3I5C1njSxYpAsqq70DmduIGEpZ84RvGb0pTSTI_fLy_gdkrIdigj1b411qdWoq1wqqLgw$

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 09 2022 - 12:00:03 PDT
Custom Search