Re: [AMBER] tleap/xleap performance enhancement from Himanshu Joshi on 2020-03-04 (Amber Archive Mar 2020)

From: Himanshu Joshi <himanshuphy87.gmail.com>
Date: Wed, 4 Mar 2020 12:13:11 -0600

Dear Matias,

Thanks for the suggestions.

I found an easy get around by joining the small fragments of system using
the parmed utility as described by Jason in the following thread.
http://archive.ambermd.org/201902/0176.html

Since it is relatively much faster to generate the parm and rst files for
the smaller systems, I divided the waterbox in 100 small fragments and
join the respective topology/parameter and coordinate file using the parmed
as suggested by Jason.

Thus assembled system looks fine when I glance through all the topology and
coordinate file.
When I load it in vmd, it loads with the error "error reading respointer
records at residue 2552319", the rest of the structure appears OK.
While running amber minimization using sander or pmemd, I get the following
error.

Sander
| Flags:
getting new box info from bottom of inpcrd
| INFO: Old style inpcrd file read

peek_ewald_inpcrd: SHOULD NOT BE HERE
  103.3630 -73.3210 281.1890 104.3202 -73.3210 281.1890

PMEMD

| ERROR: b must be in the range of 0.10000E+01 to 0.10000E+04!
| ERROR: c must be in the range of 0.10000E+01 to 0.10000E+04!
| ERROR: alpha must be in the range of 0.30000E+02 to 0.15000E+03!
| ERROR: beta must be in the range of 0.30000E+02 to 0.15000E+03!
| ERROR: gamma must be in the range of 0.30000E+02 to 0.15000E+03!

Input errors occurred. Terminating execution.

Here is the head and tails of the restart file
default_name
26096497
-254.6070000 -55.0140000 5.5030000-254.7680000 -54.0720000 5.4140000
-254.0660000 -53.4060000 6.4700000-254.1820000 -54.0030000 7.4000000
-254.4780000 -52.3810000 6.5910000-252.5690000 -53.2610000 6.1180000
-252.0630000 -52.7050000 6.9370000-251.8510000 -54.5740000 6.0050000
-251.1510000 -54.5700000 4.7810000-250.1020000 -54.2140000 4.8680000
-251.1040000 -56.0100000 4.4520000-252.2340000 -56.7490000 4.1590000
-253.1660000 -56.2660000 3.8660000-252.2890000 -58.0680000 3.9050000
-253.5800000 -58.7380000 3.7040000-253.4940000 -59.8300000 3.5180000
....
.....

-242.8028000-187.7280000 232.2430000-243.9999880-186.8013730 232.2430000
-211.2780000 230.3370000-175.4930000-210.3208000 230.3370000-175.4930000
-211.5179880 231.2636270-175.4930000-228.6210000-160.8770000 228.6720000
-227.6638000-160.8770000 228.6720000-228.8609880-159.9503730 228.6720000
-225.7170000-234.6530000-309.4660000-224.7598000-234.6530000-309.4660000
-225.9569880-233.7263730-309.4660000-179.8640000 8.5540000 -13.3580000
-178.9068000 8.5540000 -13.3580000-180.1039880 9.4806270 -13.3580000
  37.7360000-113.5280000-227.4110000 38.6932000-113.5280000-227.4110000
  37.4960120-112.6013730-227.4110000
630.0000000 630.0000000 630.0000000 90.0000000 90.0000000 90.0000000

Any clue regarding this error will be helpful.

Thank you.
Sincerely
Himanshu

On Mon, Feb 10, 2020 at 9:35 AM Matias Machado <mmachado.pasteur.edu.uy>
wrote:

> Dear Himanshu,
>
> Dealing with huge systems is a challenging issue I would like to address
> as well...
>
> These are some tips that may help (or may not)...
>
> 1) "The most time consuming process is when it prints "Starting new chain
> with *segname*". Will it help if we somehow disable the print commands."
>
> +++ To avoid this behaviour remove SEGID field from your input PDB. I'm
> not a developer, but leap seems to process that field for some reason,
> however the generated topology is identical with or without SEGID.
>
> +++ In addition, you can also save "some" I/O processing time by
> redirecting the standard output to a file or /dev/null, e.g.:
>
> tleap -f input &> /dev/null ;# avoid using xleap (GUI), rendering to x11
> may be time consuming.
>
> +++ You can avoid creating a log file by commenting/removing the following
> line from cmd files "leaprc.protein.ff14SB" and "leaprc.DNA.bsc1"
>
> logFile leap.log
>
> 2) "the tleap proceeding becomes even slower as the time proceeds"
>
> +++ This smells to me a memory (RAM) issue or a nested loop issue...
>
> 3) "I stripped hydrogen and kept only oxygen atoms in the pdb file before
> loading"
>
> +++ I see no gain in removing hydrogen atoms from water molecules, in
> doing so you are just vanishing the hydrogen-bond network, hence, which is
> the advantage of using a pre-equilibrated solvent?
>
> 4) "leap takes ~1 hour to read the pdb and create the prmtop and inpcrd"
>
> +++ I strongly recommend you to use NetCDF format for coordinates, which
> is unrestricted to system size, lighter, more efficient and accurate for
> computing than inpcrd (ASCII), in particular for such huge system you are
> building...
>
> Best,
>
> Matias
>
> ------------------------------------
> PhD.
> Researcher at Biomolecular Simulations Lab.
> Institut Pasteur de Montevideo | Uruguay
> [http://pasteur.uy/en/labs/biomolecular-simulations-laboratory]
> [http://www.sirahff.com]
>
> ----- Mensaje original -----
> De: "Himanshu Joshi" <himanshuphy87.gmail.com>
> Para: "david case" <david.case.rutgers.edu>, "AMBER Mailing List" <
> amber.ambermd.org>
> Enviados: Viernes, 7 de Febrero 2020 17:12:44
> Asunto: Re: [AMBER] tleap/xleap performance enhancement
>
> What I am most intrigued here is, I have ~4 million atoms of solute and
> leap takes ~1 hour to read the pdb and create the prmtop and inpcrd files
> Whereas for the similar number of water molecules, leap is ~50 times slower
> in creating the input files.
>
> Note: I stripped hydrogen and kept only oxygen atoms in the pdb file before
> loading
>
> ATOM ***** O WAT 7160745 -179.864 8.554 -13.358 1.00 0.00 A
> TIP
> ATOM ***** O WAT 7160746 37.736-113.528-227.411 1.00 0.00 A
> TIP
>
> Am I missing something basic here.
>
> Sincerely
> Himanshu
>
>
>
> On Fri, Feb 7, 2020 at 11:08 AM Himanshu Joshi <himanshuphy87.gmail.com>
> wrote:
>
> > Dear Prof Case,
> >
> > Thanks for your response, I will appreciate if you can suggest any
> > possible get around.
> >
> > There are solutes like DNA, protein and counterions in the system, I am
> > using parmbsc1 and ff14 for the DNA and protein, tip3p for water model.
> > However, the time consuming part is the water in the system,
> >
> > One update, the tleap proceeding becomes even slower as the time
> > proceeds, contrary to earlier estimate, after 12 hours it has created
> only
> > 500,000 water
> > molecules. And now I am concerned that it might be slower towards the
> > end.
> >
> > Thank you.
> > Sincerely
> > Himanshu
> >
> >
> >
> >
> >
> > On Fri, Feb 7, 2020 at 7:00 AM David A Case <david.case.rutgers.edu>
> > wrote:
> >
> >> On Thu, Feb 06, 2020, Himanshu Joshi wrote:
> >> >
> >> >Although, I could manage to tweak the pdb file format to get recognized
> >> by
> >> >tleap but the process of generating the topology and coordinate file is
> >> >slower, (approximately 12 hours for 1 million atoms).
> >>
> >> What force fields are you using? Is this just pure water, or are there
> >> solutes?
> >>
> >> I've seen behavior that sounds similar, but I'm trying to narrow down
> >> when it happens. No promises, however, that there will be any easy fix
> >> or workaround.
> >>
> >> ...dac
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >
> >
> > --
> >
> > *With Regards,HIMANSHU JOSHI *
> >
>
>
> --
>
>
>
> *With Regards,HIMANSHU JOSHI Graduate Scholar, Center for Condense Matter
> TheoryDepartment of Physics IISc.,Bangalore India 560012*
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
*With Regards,Himanshu Joshi*
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Mar 04 2020 - 10:30:02 PST