Hi David,
Thanks for the quick reply. We're still in our early days of trying to
parse this file, so I'm still not up-to-date on all the data fields it
contains (we're trying to figure out how to whittle this down to just
residue-like entries, so your ATOMP suggestion is really useful)
Right now we're trying to extract "main chain residue subgraphs". For
example, we want to get the connectivity table that would let us identify
an ALA residue and name its atoms, starting from the entry in
components.cif with three-letter code "ALA". I'll paste the full text of
its entry below this message.
We initially made the mistake of trying to recognize instances of ALA by
the exact chemical graph described in this entry. However, this instance of
the residue is "capped" with an H at the N terminal, and an OH at the C
terminal. So it's necessary to inspect the seventh column in each atom
description to check whether the "_chem_comp_atom.pdbx_leaving_atom_flag"
is set to "Y", since that indicates whether the atom is removed when the
residue appears in a polymer. Then, depending on which tool is handling the
entry, there may be additional bookkeeping needed to track down and remove
the involved bonds, and to ensure that the properties of the neighboring
atoms (like formal charge and hybridization) are handled correctly.
Anyway, not all of our issues may apply to you, but there's likely to be
some overlap so I'd love to stay up to date. I'm still slowly wrapping my
head around different molecule representations, but I'm optimistic about
adopting the CIF format, since it should be a nice rosetta stone for
interoperability.
Cheers,
Jeff
data_ALA
#
_chem_comp.id ALA
_chem_comp.name ALANINE
_chem_comp.type "L-PEPTIDE LINKING"
_chem_comp.pdbx_type ATOMP
_chem_comp.formula "C3 H7 N O2"
_chem_comp.mon_nstd_parent_comp_id ?
_chem_comp.pdbx_synonyms ?
_chem_comp.pdbx_formal_charge 0
_chem_comp.pdbx_initial_date 1999-07-08
_chem_comp.pdbx_modified_date 2011-06-04
_chem_comp.pdbx_ambiguous_flag N
_chem_comp.pdbx_release_status REL
_chem_comp.pdbx_replaced_by ?
_chem_comp.pdbx_replaces ?
_chem_comp.formula_weight 89.093
_chem_comp.one_letter_code A
_chem_comp.three_letter_code ALA
_chem_comp.pdbx_model_coordinates_details ?
_chem_comp.pdbx_model_coordinates_missing_flag N
_chem_comp.pdbx_ideal_coordinates_details ?
_chem_comp.pdbx_ideal_coordinates_missing_flag N
_chem_comp.pdbx_model_coordinates_db_code ?
_chem_comp.pdbx_subcomponent_list ?
_chem_comp.pdbx_processing_site RCSB
#
loop_
_chem_comp_atom.comp_id
_chem_comp_atom.atom_id
_chem_comp_atom.alt_atom_id
_chem_comp_atom.type_symbol
_chem_comp_atom.charge
_chem_comp_atom.pdbx_align
_chem_comp_atom.pdbx_aromatic_flag
_chem_comp_atom.pdbx_leaving_atom_flag
_chem_comp_atom.pdbx_stereo_config
_chem_comp_atom.model_Cartn_x
_chem_comp_atom.model_Cartn_y
_chem_comp_atom.model_Cartn_z
_chem_comp_atom.pdbx_model_Cartn_x_ideal
_chem_comp_atom.pdbx_model_Cartn_y_ideal
_chem_comp_atom.pdbx_model_Cartn_z_ideal
_chem_comp_atom.pdbx_component_atom_id
_chem_comp_atom.pdbx_component_comp_id
_chem_comp_atom.pdbx_ordinal
ALA N N N 0 1 N N N 2.281 26.213 12.804 -0.966 0.493 1.500 N ALA 1
ALA CA CA C 0 1 N N S 1.169 26.942 13.411 0.257 0.418 0.692 CA ALA 2
ALA C C C 0 1 N N N 1.539 28.344 13.874 -0.094 0.017 -0.716 C ALA 3
ALA O O O 0 1 N N N 2.709 28.647 14.114 -1.056 -0.682 -0.923 O ALA 4
ALA CB CB C 0 1 N N N 0.601 26.143 14.574 1.204 -0.620 1.296 CB ALA 5
ALA OXT OXT O 0 1 N Y N 0.523 29.194 13.997 0.661 0.439 -1.742 OXT ALA 6
ALA H H H 0 1 N N N 2.033 25.273 12.493 -1.383 -0.425 1.482 H ALA 7
ALA H2 HN2 H 0 1 N Y N 3.080 26.184 13.436 -0.676 0.661 2.452 H2 ALA 8
ALA HA HA H 0 1 N N N 0.399 27.067 12.613 0.746 1.392 0.682 HA ALA 9
ALA HB1 1HB H 0 1 N N N -0.247 26.699 15.037 1.459 -0.330 2.316 HB1 ALA 10
ALA HB2 2HB H 0 1 N N N 0.308 25.110 14.270 0.715 -1.594 1.307 HB2 ALA 11
ALA HB3 3HB H 0 1 N N N 1.384 25.876 15.321 2.113 -0.676 0.697 HB3 ALA 12
ALA HXT HXT H 0 1 N Y N 0.753 30.069 14.286 0.435 0.182 -2.647 HXT ALA 13
#
loop_
_chem_comp_bond.comp_id
_chem_comp_bond.atom_id_1
_chem_comp_bond.atom_id_2
_chem_comp_bond.value_order
_chem_comp_bond.pdbx_aromatic_flag
_chem_comp_bond.pdbx_stereo_config
_chem_comp_bond.pdbx_ordinal
ALA N CA SING N N 1
ALA N H SING N N 2
ALA N H2 SING N N 3
ALA CA C SING N N 4
ALA CA CB SING N N 5
ALA CA HA SING N N 6
ALA C O DOUB N N 7
ALA C OXT SING N N 8
ALA CB HB1 SING N N 9
ALA CB HB2 SING N N 10
ALA CB HB3 SING N N 11
ALA OXT HXT SING N N 12
#
loop_
_pdbx_chem_comp_descriptor.comp_id
_pdbx_chem_comp_descriptor.type
_pdbx_chem_comp_descriptor.program
_pdbx_chem_comp_descriptor.program_version
_pdbx_chem_comp_descriptor.descriptor
ALA SMILES ACDLabs 10.04 "O=C(O)C(N)C"
ALA SMILES_CANONICAL CACTVS 3.341 "C[C.H](N)C(O)=O"
ALA SMILES CACTVS 3.341 "C[CH](N)C(O)=O"
ALA SMILES_CANONICAL "OpenEye OEToolkits" 1.5.0 "C[C..H](C(=O)O)N"
ALA SMILES "OpenEye OEToolkits" 1.5.0 "CC(C(=O)O)N"
ALA InChI InChI 1.03
"InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1"
loop_
_pdbx_chem_comp_identifier.comp_id
_pdbx_chem_comp_identifier.type
_pdbx_chem_comp_identifier.program
_pdbx_chem_comp_identifier.program_version
_pdbx_chem_comp_identifier.identifier
ALA "SYSTEMATIC NAME" ACDLabs 10.04 L-alanine
ALA "SYSTEMATIC NAME" "OpenEye OEToolkits" 1.5.0 "(2S)-2-aminopropanoic
acid"
#
loop_
_pdbx_chem_comp_audit.comp_id
_pdbx_chem_comp_audit.action_type
_pdbx_chem_comp_audit.date
_pdbx_chem_comp_audit.processing_site
ALA "Create component" 1999-07-08 RCSB
ALA "Modify descriptor" 2011-06-04 RCSB
#
On Fri, May 14, 2021 at 9:01 AM David A Case <david.case.rutgers.edu> wrote:
> On Fri, May 14, 2021, Jeffrey Wagner wrote:
> >
> >Sorry to run this off on a tangent, but OpenFF is also trying to
> >incorporate the standard definitions from components.cif into some of our
> >own work. It's turning out to be not-entirely-trivial -- basically, we're
> >struggling to distinguish when entries in that file are describing residue
> >_substructures_ (as they'd appear in the middle of a chain), versus just
> an
> >uncapped instance of the residue as would be found floating around in
> >solution.
>
> Hi Jeff:
>
> I'm guessing that just seeing if the chem_comp.type contains the word
> "LINKING" is not enough, then. Does it help to look at the
> chem_comp.pdbx_type field, searching for "ATOMP"?
>
> Examples of failures would be helpful, as we (along with many others) are
> trying to automate MM setup procedures.
>
> ...thx...dac
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 14 2021 - 11:00:03 PDT