There is nothing wrong with a one-off script but there is a problem
with a, well, less than optimal suggestion to someone who doesn't
understand what a give code is actually doing and just breaks on the
second example it is thrown at. As I tried to point out, your example
is not doing what it is expected to do. A real solution should not
just work coincidentially and assumptions about input should be as
minimal as possible. PDB parsing is so common in our field that it is
most likely worthwhile to learn how to use a "proper" one (ideally one
which allows to work with all those PDB variants out there).
The 4th column in a PDB file is actually the fourth character of the
record name. I suggest to have a close look at the documentation at
http://www.wwpdb.org/documentation/format33/v3.3.html . The residue
name is really in columns 18-20 but some "extensions" may use 18-21
unless column 21 is "abused" to extend the chain ID. So is, starting
from column 18, 'ATOMA' a residue named 'ATOM' and chain 'A' or a
residue named 'ATO' and chain 'MA' or has 'M' no meaning what-soever?
What about, starting from column 17!: 'ATOM' = alternate locator 'A'
and residue name 'TOM' for a standard PDB, etc., etc.
Cheers,
Hannes.
On Wed, 24 Sep 2014 14:45:11 +0200
Anselm Horn <Anselm.Horn.biochem.uni-erlangen.de> wrote:
> I totally agree that my proposal is far from being perfect.
>
> However, it should serve as an example for a potential solution of the
> problem provided (number of CYX residue) with all the limitations in
> mind (for many pdb file the fourth column actually holds the residue
> name). However, there's of course a difference between a script for a
> specialized task and a more general one.
>
> And I further agree, that this is not an AMBER-related topic.
>
> Regards,
>
> Anselm
>
>
> Am 24.09.2014 14:18, schrieb Hannes Loeffler:
> > Ouch.
> >
> > Just a few problems I spotted with this:
> > 1) grep ATOM: will return _any_ line with the string 'ATOM' occuring
> > _anywhere_ on a line; some codes may also decide to use 'HETATM'
> > instead because CYX is non-standard
> > 2) grep CA: same as above, could be calcium atom, part of a residue
> > name, segment name or possibly some other abuse of the format or
> > simply any occurence on a non ATOM/HETATM record
> > 3) awk: the ancient PDB format is a _fixed-column_ format which also
> > implies that columns can "run" into each other while awk splits (by
> > default) on whitespace which may not be there; also, the residue
> > name is not the 4th datum in a ATOM/HETATM record.
> >
> > The two lines code doubles this but why would one need parse the
> > input twice anyway?
> >
> > Cheers,
> > Hannes.
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
--
Scanned by iCritical.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 24 2014 - 06:30:02 PDT