Re: [AMBER] Generate a pdb with unique atom names from Jean-Patrick Francoia on 2014-04-15 (Amber Archive Apr 2014)

From: Jean-Patrick Francoia <jeanpatrick.francoia.gmail.com>
Date: Tue, 15 Apr 2014 15:35:49 +0200

Le 15/04/2014 14:51, FyD a écrit :
> Jean-Patrick,
>
>> "If you do want to parameterize a polymer I would like to suggest you
>> to use R.E.D. Server Development/R.E.D. Python at
>> http://q4md-forcefieldtools.org/REDS-Development/"
>>
>> It is (almost..) what they do in this tutorial, isn't it ?
>> http://ambermd.org/tutorials/basic/tutorial4b/
> I do not think so...
>
> In this 'basic/tutorial4b' tutorial a small/basic molecule is
> parameterized - this is totally different from parameterizing a
> biopolymer, where the later has to be decomposed into elementary
> building blocks, and where molecular fragments have to be
> designed/generated...
>
> In this context the mol3 file format has some advantages (over the off
> file format for instance)...
>
> R.E.D. Server Dev/R.E.D. Python allows force field generation for a
> small molecule, a bunch of small molecules as well as for any type of
> polymers.
>
> See once again the work of J. Sanders (Sanders et al. force field) in
> REDDB for instance:
> http://q4md-forcefieldtools.org/REDDB/projects/F-93/ where a force
> field for the PNA biopolymer
> (http://en.wikipedia.org/wiki/Peptide_nucleic_acid) is designed.
>
> --
>
> Concerning your PDB file; three points:
> - when designing a force field library two atoms cannot share the same
> name in a given residue; BUT FORTUNATELY two atoms can share the same
> name if they belong to two different residues. Let's take an example:
> NLYS-LYS-LYS-CLYS
> there are four LYS residues and the four alpha-carbons have the same
> name: CA because they belong to four different residues.
>
> - you need to know at which pH you want to use your force field; thus
> the amino group of the side chain of Lysine is protonated at pH = 7.
>
> - your structure (names.pdb) just corresponds to the polymerization of
> a small number of different residues; each one being represented by a
> central, N-terminal and C-terminal fragments...
>
>> And what do you mean by splitting my polymer ? It is composed of 48
>> Lysines right now (first generation), but in the end, it will be
>> composed of around 1000. What I really need to do is a honest
>> minimization, and maybe a bit of docking later. I don't need to be
>> super accurate.
> obviously you can design a polymer composed of 48 or 1000 units using
> the same set of molecular fragments...
>
> regards, Francois
>
>
>>> If you do want to parameterize a polymer I would like to suggest you
>>> to use R.E.D. Server Development/R.E.D. Python at
>>> http://q4md-forcefieldtools.org/REDS-Development/
>>>
>>> You need to split your polymer into elementary building blocks...
>>>
>>> See examples in R.E.DD.B.:
>>> see http://q4md-forcefieldtools.org/REDDB/projects/F-93/
>>> http://q4md-forcefieldtools.org/REDDB/projects/F-90/
>>> http://q4md-forcefieldtools.org/REDDB/projects/F-60/
>>>
>>> Tutorials at http://q4md-forcefieldtools.org/Tutorial/
>>>
>>> Finally the OFF library file format has advantages, but also limitations;
>>> See http://q4md-forcefieldtools.org/Tutorial/leap-mol3.php
>>>
>>> We are about to release a new version of REDS-Development/R.E.D.
>>> Python and a new tutorial.
>>>
>>> If you are interested I can give you the address of the new web site
>>> and tutorial...
>>>
>>> regards, Francois
>>>
>>>
>>>> Le 14/04/2014 14:10, Jason Swails a écrit :
>>>>> On Sun, 2014-04-13 at 21:40 +0200, Jean-Patrick Francoia wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I am a master student, and one of my course is molecular dynamic. So, in
>>>>>> class we use (the old) Amber 9 to minimize some structures, for example.
>>>>>> Also, one of our exercices is to build the structure of a lysine polymer
>>>>>> with Python (a programmation language), and then to minimize the
>>>>>> structure with Amber.
>>>>>>
>>>>>> The important thing is my Python program generates a SMILE string. I can
>>>>>> import this string with plenty of softwares, and then save it into the
>>>>>> pdb format. But in this pdb, each atom does not have a unique name, a
>>>>>> very common issue when using Amber.
>>>>>>
>>>>>> So, I would like to know if there is a solution to assign each atom a
>>>>>> unique name. I have seen a lot of threads dealing about this problem on
>>>>>> the mailing list, but I have never seen any solution. I can't obviously
>>>>>> rename each atom by hand.
>>>>> LEaP must know about every residue defined in an input structure so it
>>>>> knows what atom types to apply to each atom as well as what charges to
>>>>> apply. To that end, the atom names of every atom must match the names
>>>>> in the library files so that LEaP can map the atom names to the atom
>>>>> types, charges, and connectivities.
>>>>>
>>>>> The atom names of the amino acid and nucleic acid residues match the
>>>>> atom names used by the PDB so that almost any PDB you download from the
>>>>> official Protein Database will work. If you are creating your own PDB
>>>>> from SMILES strings, you will need to fix the atom names yourself. If
>>>>> you are already using Python to do some file manipulations, you can use
>>>>> Python to write a script to rename your atoms.
>>>>>
>>>>> Alternatively, you can use LEaP to create your Lysine polymer with the
>>>>> "sequence" command. If you want a 3-mer of lysine residues, the command
>>>>>
>>>>> lys3 = sequence {NLYS LYS CLYS}
>>>>>
>>>>> will generate a tripeptide with 3 lysine residues.
>>>>>
>>>>> HTH,
>>>>> Jason
>>>>>
>>>> I can't use leap to generate the polymer, sadly. Mine is a complex one,
>>>> branched on the epsilon and alpha amines, with several hundreds of amino
>>>> acids.
>>>>
>>>> Ok I get it. But how can I know how the atoms are named in the library ?
>>>> What is the naming algorythm ?
>>>>
>>>> And what about generating my own library for my molecule ? Like they do
>>>> here:
>>>> http://ambermd.org/tutorials/basic/tutorial4b/
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> I'm answering to both of you, dear Jason and Francois:
>>
>> "If you do want to parameterize a polymer I would like to suggest you
>> to use R.E.D. Server Development/R.E.D. Python at
>> http://q4md-forcefieldtools.org/REDS-Development/"
>>
>> It is (almost..) what they do in this tutorial, isn't it ?
>> http://ambermd.org/tutorials/basic/tutorial4b/
>>
>> And what do you mean by splitting my polymer ? It is composed of 48
>> Lysines right now (first generation), but in the end, it will be
>> composed of around 1000. What I really need to do is a honest
>> minimization, and maybe a bit of docking later. I don't need to be
>> super accurate.
>>
>>
>> .Jason:
>>
>> Correct me if I'm wrong. I wrote a little Python script which "clean" a
>> bit the pdb generated by my visualization program, which transforms my
>> SMILES chain into a pdb. I now have a pdb, wherein each atom has a
>> unique name (element + a random alpha-numeric chain). I attached this
>> pdb with the name "names.pdb".
>>
>> I can load this pdb with Amber:
>>
>> G2 = loadpdb names.pdb
>> Loading PDB file: ./names.pdb
>> Created a new atom named: O5K3 within residue: .R<CLYS 0>
>> Created a new atom named: CB6H within residue: .R<CLYS 0>
>> Created a new atom named: OF0C within residue: .R<CLYS 0>
>> [...]
>> Added missing heavy atom: .R<CLYS 0>.A<OXT 23>
>> Added missing heavy atom: .R<CLYS 0>.A<C 21>
>> Added missing heavy atom: .R<CLYS 0>.A<CA 3>
>> Added missing heavy atom: .R<CLYS 0>.A<O 22>
>> Added missing heavy atom: .R<CLYS 0>.A<N 1>
>> Added missing heavy atom: .R<CLYS 0>.A<CB 5>
>> Added missing heavy atom: .R<CLYS 0>.A<CG 8>
>> Added missing heavy atom: .R<CLYS 0>.A<CD 11>
>> Added missing heavy atom: .R<CLYS 0>.A<CE 14>
>> Added missing heavy atom: .R<CLYS 0>.A<NZ 17>
>> total atoms in file: 1011
>> Leap added 23 missing atoms according to residue templates:
>> 10 Heavy
>> 13 H / lone pairs
>> The file contained 1011 atoms not in residue templates
>>
>> I can then save the pdb, and everything seems all right (no duplicate),
>> except I think leap added a supplementary lysine. (And why the name of
>> the residue is mentionned as CLYS ?)
>>
>> Of course you noticed in the pdb "names.pdb", atoms don't have a type,
>> so when I do:
>>
>> saveamberparm G2 G2.prmtop G2.inpcrd
>> Checking Unit.
>> FATAL: Atom .R<CLYS 0>.A<O5K3 24> does not have a type.
>> [...]
>>
>> I have something like this. So, I looked in the OFF library
>> documentation (thanks for the link by the way):
>>
>> !entry.CLYS.unit.atoms table str name str type int typex int resx int
>> flags int seq int elmnt dbl chg
>>
>> "N" "N" 0 1 131072 1 7 -0.463000
>> "H" "H" 0 1 131072 2 1 0.252000
>> "CA" "CT" 0 1 131072 3 6 0.035000
>> "HA" "HC" 0 1 131072 4 1 0.048000
>> "CB" "CT" 0 1 131072 5 6 -0.098000
>> "HB2" "HC" 0 1 131072 6 1 0.038000
>> "HB3" "HC" 0 1 131072 7 1 0.038000
>> "CG" "CT" 0 1 131072 8 6 -0.210000
>> "HG2" "HC" 0 1 131072 9 1 0.116000
>> "HG3" "HC" 0 1 131072 10 1 0.116000
>> "CD" "CT" 0 1 131072 11 6 -0.230000
>> "HD2" "HC" 0 1 131072 12 1 0.122000
>> "HD3" "HC" 0 1 131072 13 1 0.122000
>> "CE" "CT" 0 1 131072 14 6 -0.138000
>> "HE2" "HC" 0 1 131072 15 1 0.098000
>> "HE3" "HC" 0 1 131072 16 1 0.098000
>> "NZ" "N3" 0 1 131072 17 7 -0.138000
>> "HZ1" "H3" 0 1 131072 18 1 0.094000
>> "HZ2" "H3" 0 1 131072 19 1 0.094000
>> "HZ3" "H3" 0 1 131072 20 1 0.094000
>> "C" "C" 0 1 131072 21 6 0.624000
>> "O" "O2" 0 1 131072 22 8 -0.356000
>> "OXT" "O2" 0 1 131072 23 8 -0.356000
>>
>>
>> So it means each atom in the residue has an unique name, of course, and
>> a type. A type can be shared with other atoms. There are basically only
>> these types:
>> N
>> H
>> CT
>> HC
>> N3
>> H3
>> C
>> O2
>>
>> I assume these types match the chemical types: aliphatic carbon,
>> epsilon amine, alpha amine, etc. So now, I just have to modify my
>> parser to assign the good type to each atom, right ?
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

Just to be sure we focus on the same problem, because now I realize I
did not mention that clearly. I hope you saw that in the pdb file:
my polymer is not just a linear polymer of lysine, like traditional
ones: Lys-Lys-Lys. Otherwise I wouldn't bother you. My residues are
polymerized either on the alpha amine, or either on the epsilon amine.
And randomly.
That's why I use a Python program to generate the structure. I know two
parameters before I generate the structure: the number of lysine
residues involved in an alpha bound, and the number of residues involved
in an epsilon bound. Each time I launch my program, I end up with a
different structure respecting the ration alpha/epsilon, which is normal.

So:

"
Concerning your PDB file; three points:
- when designing a force field library two atoms cannot share the same
name in a given residue; BUT FORTUNATELY two atoms can share the same
name if they belong to two different residues. Let's take an example:
NLYS-LYS-LYS-CLYS
there are four LYS residues and the four alpha-carbons have the same
name: CA because they belong to four different residues.
"

It will be hard to distinguish each residue (amino acid) from the smile string. But I can distinguish each atom type. So let's admit I have 50 carbons alpha. Can I name them from C1 to C50 and give them a type, without mentionning the residue they are from, or do I have to call them all "CA" and mention the residue ?

- you need to know at which pH you want to use your force field; thus
the amino group of the side chain of Lysine is protonated at pH = 7.

pH 7.4. But that's a minor issue for now.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Apr 15 2014 - 07:00:02 PDT