Hi Jeff,
Thank you for the offer. I have not looked into the .cif file as yet but
when I do I would love to hear more about your progress on this project.
On Fri, May 14, 2021 at 1:58 PM Jeffrey Wagner <jwagnerjpl.gmail.com> wrote:
> Hi David,
>
> Thanks for the quick reply. We're still in our early days of trying to
> parse this file, so I'm still not up-to-date on all the data fields it
> contains (we're trying to figure out how to whittle this down to just
> residue-like entries, so your ATOMP suggestion is really useful)
>
> Right now we're trying to extract "main chain residue subgraphs". For
> example, we want to get the connectivity table that would let us identify
> an ALA residue and name its atoms, starting from the entry in
> components.cif with three-letter code "ALA". I'll paste the full text of
> its entry below this message.
>
> We initially made the mistake of trying to recognize instances of ALA by
> the exact chemical graph described in this entry. However, this instance of
> the residue is "capped" with an H at the N terminal, and an OH at the C
> terminal. So it's necessary to inspect the seventh column in each atom
> description to check whether the "_chem_comp_atom.pdbx_leaving_atom_flag"
> is set to "Y", since that indicates whether the atom is removed when the
> residue appears in a polymer. Then, depending on which tool is handling the
> entry, there may be additional bookkeeping needed to track down and remove
> the involved bonds, and to ensure that the properties of the neighboring
> atoms (like formal charge and hybridization) are handled correctly.
>
> Anyway, not all of our issues may apply to you, but there's likely to be
> some overlap so I'd love to stay up to date. I'm still slowly wrapping my
> head around different molecule representations, but I'm optimistic about
> adopting the CIF format, since it should be a nice rosetta stone for
> interoperability.
>
> Cheers,
> Jeff
>
> data_ALA
> #
> _chem_comp.id ALA
> _chem_comp.name ALANINE
> _chem_comp.type "L-PEPTIDE LINKING"
> _chem_comp.pdbx_type ATOMP
> _chem_comp.formula "C3 H7 N O2"
> _chem_comp.mon_nstd_parent_comp_id ?
> _chem_comp.pdbx_synonyms ?
> _chem_comp.pdbx_formal_charge 0
> _chem_comp.pdbx_initial_date 1999-07-08
> _chem_comp.pdbx_modified_date 2011-06-04
> _chem_comp.pdbx_ambiguous_flag N
> _chem_comp.pdbx_release_status REL
> _chem_comp.pdbx_replaced_by ?
> _chem_comp.pdbx_replaces ?
> _chem_comp.formula_weight 89.093
> _chem_comp.one_letter_code A
> _chem_comp.three_letter_code ALA
> _chem_comp.pdbx_model_coordinates_details ?
> _chem_comp.pdbx_model_coordinates_missing_flag N
> _chem_comp.pdbx_ideal_coordinates_details ?
> _chem_comp.pdbx_ideal_coordinates_missing_flag N
> _chem_comp.pdbx_model_coordinates_db_code ?
> _chem_comp.pdbx_subcomponent_list ?
> _chem_comp.pdbx_processing_site RCSB
> #
> loop_
> _chem_comp_atom.comp_id
> _chem_comp_atom.atom_id
> _chem_comp_atom.alt_atom_id
> _chem_comp_atom.type_symbol
> _chem_comp_atom.charge
> _chem_comp_atom.pdbx_align
> _chem_comp_atom.pdbx_aromatic_flag
> _chem_comp_atom.pdbx_leaving_atom_flag
> _chem_comp_atom.pdbx_stereo_config
> _chem_comp_atom.model_Cartn_x
> _chem_comp_atom.model_Cartn_y
> _chem_comp_atom.model_Cartn_z
> _chem_comp_atom.pdbx_model_Cartn_x_ideal
> _chem_comp_atom.pdbx_model_Cartn_y_ideal
> _chem_comp_atom.pdbx_model_Cartn_z_ideal
> _chem_comp_atom.pdbx_component_atom_id
> _chem_comp_atom.pdbx_component_comp_id
> _chem_comp_atom.pdbx_ordinal
> ALA N N N 0 1 N N N 2.281 26.213 12.804 -0.966 0.493 1.500 N ALA 1
> ALA CA CA C 0 1 N N S 1.169 26.942 13.411 0.257 0.418 0.692 CA ALA 2
> ALA C C C 0 1 N N N 1.539 28.344 13.874 -0.094 0.017 -0.716 C ALA 3
> ALA O O O 0 1 N N N 2.709 28.647 14.114 -1.056 -0.682 -0.923 O ALA 4
> ALA CB CB C 0 1 N N N 0.601 26.143 14.574 1.204 -0.620 1.296 CB ALA 5
> ALA OXT OXT O 0 1 N Y N 0.523 29.194 13.997 0.661 0.439 -1.742 OXT ALA 6
> ALA H H H 0 1 N N N 2.033 25.273 12.493 -1.383 -0.425 1.482 H ALA 7
> ALA H2 HN2 H 0 1 N Y N 3.080 26.184 13.436 -0.676 0.661 2.452 H2 ALA 8
> ALA HA HA H 0 1 N N N 0.399 27.067 12.613 0.746 1.392 0.682 HA ALA 9
> ALA HB1 1HB H 0 1 N N N -0.247 26.699 15.037 1.459 -0.330 2.316 HB1 ALA
> 10
> ALA HB2 2HB H 0 1 N N N 0.308 25.110 14.270 0.715 -1.594 1.307 HB2 ALA
> 11
> ALA HB3 3HB H 0 1 N N N 1.384 25.876 15.321 2.113 -0.676 0.697 HB3 ALA
> 12
> ALA HXT HXT H 0 1 N Y N 0.753 30.069 14.286 0.435 0.182 -2.647 HXT ALA
> 13
> #
> loop_
> _chem_comp_bond.comp_id
> _chem_comp_bond.atom_id_1
> _chem_comp_bond.atom_id_2
> _chem_comp_bond.value_order
> _chem_comp_bond.pdbx_aromatic_flag
> _chem_comp_bond.pdbx_stereo_config
> _chem_comp_bond.pdbx_ordinal
> ALA N CA SING N N 1
> ALA N H SING N N 2
> ALA N H2 SING N N 3
> ALA CA C SING N N 4
> ALA CA CB SING N N 5
> ALA CA HA SING N N 6
> ALA C O DOUB N N 7
> ALA C OXT SING N N 8
> ALA CB HB1 SING N N 9
> ALA CB HB2 SING N N 10
> ALA CB HB3 SING N N 11
> ALA OXT HXT SING N N 12
> #
> loop_
> _pdbx_chem_comp_descriptor.comp_id
> _pdbx_chem_comp_descriptor.type
> _pdbx_chem_comp_descriptor.program
> _pdbx_chem_comp_descriptor.program_version
> _pdbx_chem_comp_descriptor.descriptor
> ALA SMILES ACDLabs 10.04 "O=C(O)C(N)C"
> ALA SMILES_CANONICAL CACTVS 3.341 "C[C.H](N)C(O)=O"
> ALA SMILES CACTVS 3.341 "C[CH](N)C(O)=O"
> ALA SMILES_CANONICAL "OpenEye OEToolkits" 1.5.0 "C[C..H](C(=O)O)N"
> ALA SMILES "OpenEye OEToolkits" 1.5.0 "CC(C(=O)O)N"
> ALA InChI InChI 1.03
> "InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1"
>
> loop_
> _pdbx_chem_comp_identifier.comp_id
> _pdbx_chem_comp_identifier.type
> _pdbx_chem_comp_identifier.program
> _pdbx_chem_comp_identifier.program_version
> _pdbx_chem_comp_identifier.identifier
> ALA "SYSTEMATIC NAME" ACDLabs 10.04 L-alanine
> ALA "SYSTEMATIC NAME" "OpenEye OEToolkits" 1.5.0 "(2S)-2-aminopropanoic
> acid"
> #
> loop_
> _pdbx_chem_comp_audit.comp_id
> _pdbx_chem_comp_audit.action_type
> _pdbx_chem_comp_audit.date
> _pdbx_chem_comp_audit.processing_site
> ALA "Create component" 1999-07-08 RCSB
> ALA "Modify descriptor" 2011-06-04 RCSB
> #
>
> On Fri, May 14, 2021 at 9:01 AM David A Case <david.case.rutgers.edu>
> wrote:
>
> > On Fri, May 14, 2021, Jeffrey Wagner wrote:
> > >
> > >Sorry to run this off on a tangent, but OpenFF is also trying to
> > >incorporate the standard definitions from components.cif into some of
> our
> > >own work. It's turning out to be not-entirely-trivial -- basically,
> we're
> > >struggling to distinguish when entries in that file are describing
> residue
> > >_substructures_ (as they'd appear in the middle of a chain), versus just
> > an
> > >uncapped instance of the residue as would be found floating around in
> > >solution.
> >
> > Hi Jeff:
> >
> > I'm guessing that just seeing if the chem_comp.type contains the word
> > "LINKING" is not enough, then. Does it help to look at the
> > chem_comp.pdbx_type field, searching for "ATOMP"?
> >
> > Examples of failures would be helpful, as we (along with many others) are
> > trying to automate MM setup procedures.
> >
> > ...thx...dac
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
--
Kellon A. A. Belfon, Graduate Student
Carlos Simmerling Laboratory
The Laufer Center for Physical and Quantitative Biology
The Department of Chemistry, Stony Brook University
Stony Brook, New York 11794
Phone: (347) 546-4237 <(347)+546+4237> Email: kellon.belfon.stonybrook.
<kellon.belfon.stonybrook.edu>edu
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 14 2021 - 12:00:03 PDT