Re: [AMBER] Generate a pdb with unique atom names

From: FyD <fyd.q4md-forcefieldtools.org>
Date: Tue, 15 Apr 2014 16:06:39 +0200

Quoting Jean-Patrick Francoia <jeanpatrick.francoia.gmail.com>:

>>> "If you do want to parameterize a polymer I would like to suggest you
>>> to use R.E.D. Server Development/R.E.D. Python at
>>> http://q4md-forcefieldtools.org/REDS-Development/"
>>>
>>> It is (almost..) what they do in this tutorial, isn't it ?
>>> http://ambermd.org/tutorials/basic/tutorial4b/
>> I do not think so...
>>
>> In this 'basic/tutorial4b' tutorial a small/basic molecule is
>> parameterized - this is totally different from parameterizing a
>> biopolymer, where the later has to be decomposed into elementary
>> building blocks, and where molecular fragments have to be
>> designed/generated...
>>
>> In this context the mol3 file format has some advantages (over the off
>> file format for instance)...
>>
>> R.E.D. Server Dev/R.E.D. Python allows force field generation for a
>> small molecule, a bunch of small molecules as well as for any type of
>> polymers.
>>
>> See once again the work of J. Sanders (Sanders et al. force field) in
>> REDDB for instance:
>> http://q4md-forcefieldtools.org/REDDB/projects/F-93/ where a force
>> field for the PNA biopolymer
>> (http://en.wikipedia.org/wiki/Peptide_nucleic_acid) is designed.
>>
>> --
>>
>> Concerning your PDB file; three points:
>> - when designing a force field library two atoms cannot share the same
>> name in a given residue; BUT FORTUNATELY two atoms can share the same
>> name if they belong to two different residues. Let's take an example:
>> NLYS-LYS-LYS-CLYS
>> there are four LYS residues and the four alpha-carbons have the same
>> name: CA because they belong to four different residues.
>>
>> - you need to know at which pH you want to use your force field; thus
>> the amino group of the side chain of Lysine is protonated at pH = 7.
>>
>> - your structure (names.pdb) just corresponds to the polymerization of
>> a small number of different residues; each one being represented by a
>> central, N-terminal and C-terminal fragments...
>>
>>> And what do you mean by splitting my polymer ? It is composed of 48
>>> Lysines right now (first generation), but in the end, it will be
>>> composed of around 1000. What I really need to do is a honest
>>> minimization, and maybe a bit of docking later. I don't need to be
>>> super accurate.
>> obviously you can design a polymer composed of 48 or 1000 units using
>> the same set of molecular fragments...
>>
>> regards, Francois
>>
>>
>>>> If you do want to parameterize a polymer I would like to suggest you
>>>> to use R.E.D. Server Development/R.E.D. Python at
>>>> http://q4md-forcefieldtools.org/REDS-Development/
>>>>
>>>> You need to split your polymer into elementary building blocks...
>>>>
>>>> See examples in R.E.DD.B.:
>>>> see http://q4md-forcefieldtools.org/REDDB/projects/F-93/
>>>> http://q4md-forcefieldtools.org/REDDB/projects/F-90/
>>>> http://q4md-forcefieldtools.org/REDDB/projects/F-60/
>>>>
>>>> Tutorials at http://q4md-forcefieldtools.org/Tutorial/
>>>>
>>>> Finally the OFF library file format has advantages, but also limitations;
>>>> See http://q4md-forcefieldtools.org/Tutorial/leap-mol3.php
>>>>
>>>> We are about to release a new version of REDS-Development/R.E.D.
>>>> Python and a new tutorial.
>>>>
>>>> If you are interested I can give you the address of the new web site
>>>> and tutorial...
>>>>
>>>> regards, Francois
>>>>
>>>>
>>>>> Le 14/04/2014 14:10, Jason Swails a écrit :
>>>>>> On Sun, 2014-04-13 at 21:40 +0200, Jean-Patrick Francoia wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am a master student, and one of my course is molecular
>>>>>>> dynamic. So, in
>>>>>>> class we use (the old) Amber 9 to minimize some structures,
>>>>>>> for example.
>>>>>>> Also, one of our exercices is to build the structure of a
>>>>>>> lysine polymer
>>>>>>> with Python (a programmation language), and then to minimize the
>>>>>>> structure with Amber.
>>>>>>>
>>>>>>> The important thing is my Python program generates a SMILE
>>>>>>> string. I can
>>>>>>> import this string with plenty of softwares, and then save it into the
>>>>>>> pdb format. But in this pdb, each atom does not have a unique name, a
>>>>>>> very common issue when using Amber.
>>>>>>>
>>>>>>> So, I would like to know if there is a solution to assign each atom a
>>>>>>> unique name. I have seen a lot of threads dealing about this problem on
>>>>>>> the mailing list, but I have never seen any solution. I can't obviously
>>>>>>> rename each atom by hand.
>>>>>> LEaP must know about every residue defined in an input structure so it
>>>>>> knows what atom types to apply to each atom as well as what charges to
>>>>>> apply. To that end, the atom names of every atom must match the names
>>>>>> in the library files so that LEaP can map the atom names to the atom
>>>>>> types, charges, and connectivities.
>>>>>>
>>>>>> The atom names of the amino acid and nucleic acid residues match the
>>>>>> atom names used by the PDB so that almost any PDB you download from the
>>>>>> official Protein Database will work. If you are creating your own PDB
>>>>>> from SMILES strings, you will need to fix the atom names yourself. If
>>>>>> you are already using Python to do some file manipulations, you can use
>>>>>> Python to write a script to rename your atoms.
>>>>>>
>>>>>> Alternatively, you can use LEaP to create your Lysine polymer with the
>>>>>> "sequence" command. If you want a 3-mer of lysine residues, the command
>>>>>>
>>>>>> lys3 = sequence {NLYS LYS CLYS}
>>>>>>
>>>>>> will generate a tripeptide with 3 lysine residues.
>>>>>>
>>>>>> HTH,
>>>>>> Jason
>>>>>>
>>>>> I can't use leap to generate the polymer, sadly. Mine is a complex one,
>>>>> branched on the epsilon and alpha amines, with several hundreds of amino
>>>>> acids.
>>>>>
>>>>> Ok I get it. But how can I know how the atoms are named in the library ?
>>>>> What is the naming algorythm ?
>>>>>
>>>>> And what about generating my own library for my molecule ? Like they do
>>>>> here:
>>>>> http://ambermd.org/tutorials/basic/tutorial4b/
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> I'm answering to both of you, dear Jason and Francois:
>>>
>>> "If you do want to parameterize a polymer I would like to suggest you
>>> to use R.E.D. Server Development/R.E.D. Python at
>>> http://q4md-forcefieldtools.org/REDS-Development/"
>>>
>>> It is (almost..) what they do in this tutorial, isn't it ?
>>> http://ambermd.org/tutorials/basic/tutorial4b/
>>>
>>> And what do you mean by splitting my polymer ? It is composed of 48
>>> Lysines right now (first generation), but in the end, it will be
>>> composed of around 1000. What I really need to do is a honest
>>> minimization, and maybe a bit of docking later. I don't need to be
>>> super accurate.
>>>
>>>
>>> .Jason:
>>>
>>> Correct me if I'm wrong. I wrote a little Python script which "clean" a
>>> bit the pdb generated by my visualization program, which transforms my
>>> SMILES chain into a pdb. I now have a pdb, wherein each atom has a
>>> unique name (element + a random alpha-numeric chain). I attached this
>>> pdb with the name "names.pdb".
>>>
>>> I can load this pdb with Amber:
>>>
>>> G2 = loadpdb names.pdb
>>> Loading PDB file: ./names.pdb
>>> Created a new atom named: O5K3 within residue: .R<CLYS 0>
>>> Created a new atom named: CB6H within residue: .R<CLYS 0>
>>> Created a new atom named: OF0C within residue: .R<CLYS 0>
>>> [...]
>>> Added missing heavy atom: .R<CLYS 0>.A<OXT 23>
>>> Added missing heavy atom: .R<CLYS 0>.A<C 21>
>>> Added missing heavy atom: .R<CLYS 0>.A<CA 3>
>>> Added missing heavy atom: .R<CLYS 0>.A<O 22>
>>> Added missing heavy atom: .R<CLYS 0>.A<N 1>
>>> Added missing heavy atom: .R<CLYS 0>.A<CB 5>
>>> Added missing heavy atom: .R<CLYS 0>.A<CG 8>
>>> Added missing heavy atom: .R<CLYS 0>.A<CD 11>
>>> Added missing heavy atom: .R<CLYS 0>.A<CE 14>
>>> Added missing heavy atom: .R<CLYS 0>.A<NZ 17>
>>> total atoms in file: 1011
>>> Leap added 23 missing atoms according to residue templates:
>>> 10 Heavy
>>> 13 H / lone pairs
>>> The file contained 1011 atoms not in residue templates
>>>
>>> I can then save the pdb, and everything seems all right (no duplicate),
>>> except I think leap added a supplementary lysine. (And why the name of
>>> the residue is mentionned as CLYS ?)
>>>
>>> Of course you noticed in the pdb "names.pdb", atoms don't have a type,
>>> so when I do:
>>>
>>> saveamberparm G2 G2.prmtop G2.inpcrd
>>> Checking Unit.
>>> FATAL: Atom .R<CLYS 0>.A<O5K3 24> does not have a type.
>>> [...]
>>>
>>> I have something like this. So, I looked in the OFF library
>>> documentation (thanks for the link by the way):
>>>
>>> !entry.CLYS.unit.atoms table str name str type int typex int resx int
>>> flags int seq int elmnt dbl chg
>>>
>>> "N" "N" 0 1 131072 1 7 -0.463000
>>> "H" "H" 0 1 131072 2 1 0.252000
>>> "CA" "CT" 0 1 131072 3 6 0.035000
>>> "HA" "HC" 0 1 131072 4 1 0.048000
>>> "CB" "CT" 0 1 131072 5 6 -0.098000
>>> "HB2" "HC" 0 1 131072 6 1 0.038000
>>> "HB3" "HC" 0 1 131072 7 1 0.038000
>>> "CG" "CT" 0 1 131072 8 6 -0.210000
>>> "HG2" "HC" 0 1 131072 9 1 0.116000
>>> "HG3" "HC" 0 1 131072 10 1 0.116000
>>> "CD" "CT" 0 1 131072 11 6 -0.230000
>>> "HD2" "HC" 0 1 131072 12 1 0.122000
>>> "HD3" "HC" 0 1 131072 13 1 0.122000
>>> "CE" "CT" 0 1 131072 14 6 -0.138000
>>> "HE2" "HC" 0 1 131072 15 1 0.098000
>>> "HE3" "HC" 0 1 131072 16 1 0.098000
>>> "NZ" "N3" 0 1 131072 17 7 -0.138000
>>> "HZ1" "H3" 0 1 131072 18 1 0.094000
>>> "HZ2" "H3" 0 1 131072 19 1 0.094000
>>> "HZ3" "H3" 0 1 131072 20 1 0.094000
>>> "C" "C" 0 1 131072 21 6 0.624000
>>> "O" "O2" 0 1 131072 22 8 -0.356000
>>> "OXT" "O2" 0 1 131072 23 8 -0.356000
>>>
>>>
>>> So it means each atom in the residue has an unique name, of course, and
>>> a type. A type can be shared with other atoms. There are basically only
>>> these types:
>>> N
>>> H
>>> CT
>>> HC
>>> N3
>>> H3
>>> C
>>> O2
>>>
>>> I assume these types match the chemical types: aliphatic carbon,
>>> epsilon amine, alpha amine, etc. So now, I just have to modify my
>>> parser to assign the good type to each atom, right ?
>
> Just to be sure we focus on the same problem, because now I realize I
> did not mention that clearly. I hope you saw that in the pdb file:
> my polymer is not just a linear polymer of lysine, like traditional
> ones: Lys-Lys-Lys. Otherwise I wouldn't bother you. My residues are
> polymerized either on the alpha amine, or either on the epsilon amine.

I saw that - Yes ;-) the problem you mentioned is simply:
"is my polymer a linear polymer or a branched polymer?"

both types of polymer are handled by R.E.D. Python, and obviously the
second one is slightly more complex...

> That's why I use a Python program to generate the structure. I know two
> parameters before I generate the structure: the number of lysine
> residues involved in an alpha bound, and the number of residues involved
> in an epsilon bound. Each time I launch my program, I end up with a
> different structure respecting the ration alpha/epsilon, which is normal.
>
> So:
>
> "
> Concerning your PDB file; three points:
> - when designing a force field library two atoms cannot share the same
> name in a given residue; BUT FORTUNATELY two atoms can share the same
> name if they belong to two different residues. Let's take an example:
> NLYS-LYS-LYS-CLYS
> there are four LYS residues and the four alpha-carbons have the same
> name: CA because they belong to four different residues.
> "
>
> It will be hard to distinguish each residue (amino acid) from the
> smile string. But I can distinguish each atom type. So let's admit I
> have 50 carbons alpha. Can I name them from C1 to C50 and give them
> a type, without mentionning the residue they are from, or do I have
> to call them all "CA" and mention the residue ?

Reordering a PDB file (the polymer) in agreement with the FF libraries
(molecular fragment) is often a tedious task...

regards, Francois




_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 15 2014 - 07:30:02 PDT
Custom Search