Re: [AMBER] Error running large number of trajectories with multisander

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Mon, 11 Feb 2013 11:30:42 -0700

Hi,

Without knowing more about your system, your input, and where you're
trying to run I can only speculate, but it seems like memory is being
blown during mask parsing. The eval() function allocates a stack for
use when converting the mask string into tokens which are then parsed
to get the final mask selection. This stack is 256*natom*4 bytes
(assuming integer size of 4 bytes), so depending on how many atoms are
in your system, that's how much memory each thread needs at that
point. For even a moderately large system (say 100k atoms) that's not
too much memory per thread (~100MB), but if threads are sharing memory
this can add up (i.e. if all 192 groups have to share memory you'll
need 19.2 GB free in addition to all other allocated memory).

-Dan

On Mon, Feb 11, 2013 at 8:42 AM, Brian Radak <radak004.umn.edu> wrote:
> I've been running a large number of (independent) QM/MM umbrella sampling
> simulations using multisander. I haven't done exhaustive tests, but running
> a small (12) or modestly large (108) number of jobs works just fine.
> However, using a larger number of jobs (192) results in a few errors:
>
> === STDERR ===
> out of dynamic memory in opal_show_help_yylex()
>
> <MPI_ABORT message omitted for brevity>
>
> [c314-113.ls4.tacc.utexas.edu:11115] 4 more processes have sent help
> message help-mpi-api.txt / mpi-abort
> [c314-113.ls4.tacc.utexas.edu:11115] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
> ======
>
> === STDOUT ===
> Error in group input::atommask.f::eval
> stack allocation error
> ======
>
> A typical groupfile entry looks like:
>
> -O -i r0/HEEP_QMMM_US_3.inp -o r0/HEEP_QMMM_US_3.out -p ../input/HEEP.parm7
> -c r0/HEEP_QMMM_US_2.rst7 -r r0/HEEP_QMMM_US_3.rst7 -x r0/HEEP_QMMM_US_3.nc
> -ref ../equilibration/minimize0.rst7
>
> The only thing here that seems not so smart is that ALL of the trajectories
> use the same file for the reference coordinates. Does that sound like it
> could be a problem? Does that make any sense in giving rise to the errors
> here? Other ideas?
>
> Thanks,
> Brian
>
> --
> ================================ Current Address =======================
> Brian Radak : BioMaPS
> Institute for Quantitative Biology
> PhD candidate - York Research Group : Rutgers, The State
> University of New Jersey
> University of Minnesota - Twin Cities : Center for Integrative
> Proteomics Room 308
> Graduate Program in Chemical Physics : 174 Frelinghuysen Road,
> Department of Chemistry : Piscataway, NJ
> 08854-8066
> radak004.umn.edu :
> radakb.biomaps.rutgers.edu
> ====================================================================
> Sorry for the multiple e-mail addresses, just use the institute appropriate
> address.
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-9119 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 11 2013 - 11:00:02 PST
Custom Search