Re: [AMBER] CPPTRAJ, Counting Per Residue Pair H-Bonding

From: Kenneth Huang <khuang8.student.gsu.edu>
Date: Wed, 16 Mar 2022 20:48:10 +0000

Hi Matthew,

That's an easy fix! You just have to drop the last pipe
awk -F"_" '$1=$1' OFS=" " test | awk -F"." '$1=$1' OFS="\t" | awk '{print $2,$5,$7}'
To break it down, the first awk says find every _ and replace with a blank space, then does the same thing for @, and then finally just prints out the leftover cols 2, 5, and 7. The raw output after you get rid of the _ and @ (which is the harder part for other things to read in/deal with) in the first two awk passes would be
ASP 85 OD2 THR 66 OG1 0.9946
ASP 85 OD1 GLY 87 N 0.9598
ASP 85 OD2 GLN 79 N 0.5442
ASP 85 OD1 THR 66 N 0.1769
ASP 85 O TRP 88 N 0.0670
ASP 85 OD2 GLN 79 NE2 0.0563
ASP 85 OD2 LEU 78 N 0.0107
ASP 85 OD1 THR 66 OG1 0.0027
awk -F"_" '$1=$1' OFS=" " test | awk -F"@" '$1=$1' OFS="\t" | awk '{print $2,$5,$7}'
​85 66 0.9946
85 87 0.9598
85 79 0.5442
85 66 0.1769
85 88 0.0670
85 79 0.0563
85 78 0.0107
85 66 0.0027
Just comes from printing the relevant portions (col 2, 5, and 7 unless you want the atoms involved, that is).

awk -F"_" '$1=$1' OFS=" " test | awk -F"." '$1=$1' OFS="\t" | awk '{print $2,$5,$7}' | awk '!_[$1,$2]++’
What the last bit does is keep only the first instance of any pattern based on 1+2 of the residue# residue# format; I forgot in my case I wasn't looking for redundant Hbonds, and just wanted to track the dominant interactions between pairs so I tossed out the extra bits.


Best,
Kenneth
________________________________
From: Matthew Guberman-Pfeffer <matthew.guberman-pfeffer.uconn.edu>
Sent: Wednesday, March 16, 2022 12:57 PM
To: AMBER Mailing List <amber.ambermd.org>
Subject: Re: [AMBER] CPPTRAJ, Counting Per Residue Pair H-Bonding

Hi Kenneth,

Thanks for the clarification! One further question (my apologies to the ListServe for these scripting questions).

Right, I get the full list if I use: awk 'NR>1 {print $1,$3,$5}' All.UU.avg.dat | grep “ASP_85”,
However, in the second step, if I use: awk -F"_" '$1=$1' OFS=" " test | awk -F"@" '$1=$1' OFS="\t" | awk '{print $2,$5,$7}' | awk '!_[$1,$2]++’
I only get the first occurrence of a given pair. For example,

Output of first command:
ASP_85.OD2 THR_66.OG1 0.9946
ASP_85.OD1 GLY_87.N 0.9598
ASP_85.OD2 GLN_79.N 0.5442
ASP_85.OD1 THR_66.N 0.1769
ASP_85.O TRP_88.N 0.0670
ASP_85.OD2 GLN_79.NE2 0.0563
ASP_85.OD2 LEU_78.N 0.0107
ASP_85.OD1 THR_66.OG1 0.0027

Output of second command:
85 66 0.9946
85 87 0.9598
85 79 0.5442
85 88 0.0670
85 78 0.0107

Note that only the first "85 66" pair appears. Is there a way to fix this? I don’t understand the syntax of the second command very well.

Best,
Matthew




> On Mar 16, 2022, at 12:30 PM, Kenneth Huang <khuang8.student.gsu.edu> wrote:
>
> *Message sent from a system outside of UConn.*
>
>
> If I run the commands you mentioned, and sort the output from the last command by the first column, I only see for residue 85:
> 85 66 0.9946
> 85 87 0.9598
>
> Whoops, that's from the first awk portion actually-
>
> awk 'NR>1 && NR<279 {print $1,$3,$5}' hbond_avg.*.out | awk '$3 > .05 {print ;}' > test
>
> Where I had it drop anything that was under 5% (0.05) after pulling out columns 1, 3, and 5, assuming it wasn't worth looking at. So the version you'd want would just be
>
> awk 'NR>1 && NR<279 {print $1,$3,$5}' hbond_avg.*.out > test
>
> Which won't apply a cutoff unless you add it back in.
>
> the list from your commands is only ~273 lines long, whereas the original out file is >3000 for the entire protein
>
> That might be from the first part, which specifies to only look at lines between 1 and 279-
>
> awk 'NR>1 && NR<279
> Since in my case, there were solvent-solute bonds also listed (starting at line 279)- if you don't have solvent-solute bonds as part of the hbond command, you can just drop it down to
>
> awk 'NR>1 {print $1,$3,$5}' hbond_avg.*.out > test
>
> Best,
> Kenneth
> ________________________________
> From: Matthew Guberman-Pfeffer <matthew.guberman-pfeffer.uconn.edu>
> Sent: Wednesday, March 16, 2022 11:00 AM
> To: AMBER Mailing List <amber.ambermd.org>
> Subject: Re: [AMBER] CPPTRAJ, Counting Per Residue Pair H-Bonding
>
> Hi Kenneth,
>
> Thanks for the awk commands. I have a question about the second command.
>
> In my avg.out file, I have:
> ASP_85.OD2 THR_66.HG1 THR_66.OG1 371 0.9946 2.6259 161.8941
> ASP_85.OD1 GLY_87.H GLY_87.N 358 0.9598 2.8142 157.8910
> ASP_85.OD2 GLN_79.H GLN_79.N 203 0.5442 2.8921 162.1535
> ASP_85.OD1 THR_66.H THR_66.N 66 0.1769 2.8812 146.1349
> ASP_85.O TRP_88.H TRP_88.N 25 0.0670 2.9352 167.3955
> ASP_85.OD2 GLN_79.HE22 GLN_79.NE2 21 0.0563 2.8330 153.7837
> ASP_85.OD2 LEU_78.H LEU_78.N 4 0.0107 2.8733 148.6609
> ASP_85.OD1 THR_66.HG1 THR_66.OG1 1 0.0027 2.9728 142.6676
>
> If I run the commands you mentioned, and sort the output from the last command by the first column, I only see for residue 85:
> 85 66 0.9946
> 85 87 0.9598
>
> Why does it seem like H-bonding interactions are missing? (In Fact, the list from your commands is only ~273 lines long, whereas the original out file is >3000 for the entire protein.) What am I missing or not understanding about the second awk command?
>
> Best,
> Matthew
>
>
>
>
>
>
>
>> On Mar 16, 2022, at 10:37 AM, Kenneth Huang <khuang8.student.gsu.edu> wrote:
>>
>> *Message sent from a system outside of UConn.*
>>
>>
>> Hi,
>>
>> I think something like this might do the trick of what you're looking for if I'm understanding your question correctly?
>>
>> hbond DNA out hbond.DNA.dat :1-46 angle 135 dist 3.0 avgout hbond_avg.DNA.out bridgeout hbond_brid.DNA.dat solventdonor :WAT solventacceptor :WAT.O nointramol
>>
>> Should automatically get all the hbonds in the residue mask (1-46) so you don't have to do a pairwise matrix by loop, and will dump the breakdown fractions of each combination into the avg.out file, which you can make a lot more readable with
>>
>> #print out from input in range- defined to avoid solute solvent hbond; solvent starts at line 279 here
>> awk 'NR>1 && NR<279 {print $1,$3,$5}' hbond_avg.*.out | awk '$3 > .05 {print ;}' > test
>>
>> #make the output to residue# residue# and Hbond%
>> awk -F"_" '$1=$1' OFS=" " test | awk -F"@" '$1=$1' OFS="\t" | awk '{print $2,$5,$7}' | awk '!_[$1,$2]++' > test2a
>> Which will get you the fraction of the trajectory the hbond is present- not quite a 'count' per pair, but since the pairs are all unique, you can just count the number of entries to get total, and do some more awk/sort manipulation of the residue numbers to see what residue is involved in multiple hbonds, etc, which I think is what you want?
>>
>> Best,
>> Kenneth
>> ________________________________
>> From: Matthew Guberman-Pfeffer <matthew.guberman-pfeffer.uconn.edu>
>> Sent: Wednesday, March 16, 2022 9:45 AM
>> To: AMBER Mailing List <amber.ambermd.org>
>> Subject: [AMBER] CPPTRAJ, Counting Per Residue Pair H-Bonding
>>
>> Dear Amber Community,
>>
>> I need to quantify the total number of H-bonds for each pair of residues in my protein. That would mean a 1321 x 1321 matrix. I am having trouble in writing a bash script to process the CPPTRAJ hbond output (e.g. All.UU.avg.dat) to compute each matrix element. I’ve thought of running CPPTRAJ for every possible pair of donormask and acceptormask, but this is truly an ugly brute-force approach. Does anyone have suggestions on how I can compute the total number of H-bonds between each pair of residues?
>>
>> Best,
>> Matthew
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&amp;data=04%7C01%7Ckhuang8%40student.gsu.edu%7Cdf01c204c49e40843f1708da076e250b%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C637830466921973076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LSjfpuS9IeU87vqCdzkyL5D4Oh7q8nnTbwATUaf2h%2FA%3D&amp;reserved=0
>> CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&amp;data=04%7C01%7Ckhuang8%40student.gsu.edu%7Cdf01c204c49e40843f1708da076e250b%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C637830466921973076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LSjfpuS9IeU87vqCdzkyL5D4Oh7q8nnTbwATUaf2h%2FA%3D&amp;reserved=0
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&amp;data=04%7C01%7Ckhuang8%40student.gsu.edu%7Cdf01c204c49e40843f1708da076e250b%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C637830466921973076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LSjfpuS9IeU87vqCdzkyL5D4Oh7q8nnTbwATUaf2h%2FA%3D&amp;reserved=0
> CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&amp;data=04%7C01%7Ckhuang8%40student.gsu.edu%7Cdf01c204c49e40843f1708da076e250b%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C637830466921973076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LSjfpuS9IeU87vqCdzkyL5D4Oh7q8nnTbwATUaf2h%2FA%3D&amp;reserved=0

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ambermd.org%2Fmailman%2Flistinfo%2Famber&amp;data=04%7C01%7Ckhuang8%40student.gsu.edu%7Cdf01c204c49e40843f1708da076e250b%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C637830466921973076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LSjfpuS9IeU87vqCdzkyL5D4Oh7q8nnTbwATUaf2h%2FA%3D&amp;reserved=0
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 16 2022 - 14:00:02 PDT
Custom Search