Ouch.
Just a few problems I spotted with this:
1) grep ATOM: will return _any_ line with the string 'ATOM' occuring
_anywhere_ on a line; some codes may also decide to use 'HETATM'
instead because CYX is non-standard
2) grep CA: same as above, could be calcium atom, part of a residue
name, segment name or possibly some other abuse of the format or simply
any occurence on a non ATOM/HETATM record
3) awk: the ancient PDB format is a _fixed-column_ format which also
implies that columns can "run" into each other while awk splits (by
default) on whitespace which may not be there; also, the residue
name is not the 4th datum in a ATOM/HETATM record.
The two lines code doubles this but why would one need parse the input
twice anyway?
Cheers,
Hannes.
On Wed, 24 Sep 2014 12:43:17 +0200
Anselm Horn <Anselm.Horn.biochem.uni-erlangen.de> wrote:
> Hi James,
>
> to obtain a list of all CYX residues in your pdb file in consecutive
> numbering, you could do something like the following:
>
> grep ATOM XXXX.pdb | grep CA | awk 'BEGIN{n=0}{n++; if
> ($4=="CYX"){print n}}'
>
> When you are sure, that only two CYX entries exist in your file, you
> could simply used the 'head' and 'tail' command to extract the two
> numbers:
>
> cyx1=`grep ATOM XXXX.pdb | grep CA | awk 'BEGIN{n=0}{n++; if
> ($4=="CYX"){print n}}' | head -n 1`
> cyx2=`grep ATOM XXXX.pdb | grep CA | awk 'BEGIN{n=0}{n++; if
> ($4=="CYX"){print n}}' | tail -n 1`
--
Scanned by iCritical.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 24 2014 - 05:30:02 PDT