Re: [AMBER] Count number of frames in any trajectory file

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Mon, 18 Jul 2011 13:10:11 -0400

Hi,

The number of frames is stored in the netcdf file, but not formatted
ASCII trajectories. Ptraj and cpptraj get the number of frames
directly from netcdf files, and calculate the number of frames based
on file size for ASCII trajectories.

Unfortunately there is no simple way to extract the number of frames
from a netcdf file without using netcdf file routines since it is a
binary format. You could potentially write a simple C program that
will get the number of frames from a netcdf file using the AmberTools
code as a guide.

However. ptraj and cpptraj do print lines like:

Ptraj: "File (MDCRD) is an FORMAT trajectory with X sets (processing only Y)"
Cpptraj: "[MDCRD] is an FORMAT trajectory, Parm 0 (reading Y of X)"

So you could potentially extract the number of frames from that
output, right? For ptraj I can see how this might not be optimal since
you don't want to read the entire trajectory and something like:

trajin MDCRD 1 1

Results in ptraj saying the trajectory only has 1 set. However,
cpptraj always reports the total number of frames in the trajectory no
matter what, so the "(reading Y of X)" part can be relied on to always
tell you the total # of frames in the trajectory.

-Dan

On Mon, Jul 18, 2011 at 11:53 AM, Jan-Philip Gehrcke
<jgehrcke.googlemail.com> wrote:
> On 07/18/2011 05:32 PM, Daniel Roe wrote:
>> Hi,
>>
>> Assuming your mdcrd file has no box coords and is not corrupted you
>> can calculate the number of lines that each frame of your trajectory
>> should have based on this formula:
>>
>> #LinesPerFrame = ((#atoms * 3) / 10)
>>
>> If there is any fraction after the division, round up. If your
>> trajectory has box coordinates add 1. From this, you can calculate the
>> number of frames from the total number of lines in your trajectory:
>>
>> #Frames = (#LinesInTrajectory - 1) / #LinesPerFrame
>>
>> The "-1" in the first term is because every trajectory has a title
>> line. Hope this helps,
>
> Dan,
>
> thanks for your answer. First of all, I should clarify that I am looking
> for the most efficient programmatic approach to get this number directly
> from the mdcrd file (independent from its format).
>
> Your solution won't work for binary file formats, right?
>
> ptraj seems to be the right tool to get the number I am after, but I do
> not know how to do it best... consider the following ptraj input:
>
>    trajin md_heatup.mdcrd
>    trajout f netcdf
>
> It runs and prints to stdout:
>
>    PTRAJ: trajin md_heatup.mdcrd
>      Checking coordinates: md_heatup.mdcrd
>    NETCDF file:
>
>    PTRAJ: trajout f netcdf
>      md_heatup.mdcrd: 100 frames.
>
> There it is: 100.
>
> I could parse the output, but in order to make this efficient, I would
> need some non-time-consuming ptraj action to be run, because this number
> is only printed if there is a `trajout` statement -- usually connected
> to some time-consuming I/O operation. Is there a ptraj command executing
> very quickly even for big files? Going this way would be a (dirty) option...
>
> There should be a better way. I noticed this number being printed by
> ptraj very quickly even for input mdcrd files of several gigabytes. So,
> is this number stored in the mdcrd file header (of at least NetCDF files)?
>
> Thanks!
>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 18 2011 - 10:30:02 PDT
Custom Search