Re: AMBER: memory SC45 from Robert Duke on 2003-10-21 (Amber Archive Oct 2003)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 21 Oct 2003 22:12:31 -0400

Mu -
It has to be a system administration/configuration issue, where the defaults
are set up relatively restrictively, and/or you are not asking for memory in
the correct manner. I know that an unusual setup prevails at psc on the
salk alphaserver, which is now a pair of big alphaservers (something like 32
per box, I believe). Under PBS there (the queuing system) you can ask for
nodes and processors, or some total of memory. Point is, on machines like
this, there can be issues about the total memory used. However, I think
something more fundamental in terms of setup is going on. 4 processors, 8
angstrom cutoff, runs in around 200 MB, and that is nothing!!! SO somehow
you are not asking for memory correctly, or there are some config issues.
Does anyone else out there know anything about this system? I don't have
access, and can't begin to debug the job really. Mu, if you would like you
can send me a gzipped tarball of all your files used in the run (mdin,
prmtop, inpcrd) - as well as the scripts you use to queue it. Probably
wouldn't hurt to send your MACHINE file either. I will confirm that it
should work on a pc, and then an alphaserver for you. Do you do anything
else on this machine? Does it work?
Regards - Bob

----- Original Message -----
From: "Mu Yuguang (Dr)" <YGMu.ntu.edu.sg>
To: <amber.scripps.edu>
Sent: Tuesday, October 21, 2003 9:19 PM
Subject: RE: AMBER: memory SC45

> Dear Rob,
> I have tried 4 cpus, to 28 cpus, the same error.
> I use 10A cutoff, try 8A cutoff no better.
>
> I am quite surprised that this 58K system runs quite well on a dual-cpus
> pc-linux, why cannot run on the "supercomputer" ?
>
> How can I know the available physical memory on this machine ?
>
>
> -----Original Message-----
> From: Robert Duke [mailto:rduke.email.unc.edu]
> Sent: Tuesday, October 21, 2003 8:55 PM
> To: amber.scripps.edu
> Subject: Re: AMBER: memory SC45
>
> Mu -
> A 58K atom system should run in about 51 MB on a single processor, which
> is
> nothing. Once again, HOW MANY PROCESSESORS are you using? If you are
> using
> 128 on a shared memory machine, then there could be global allocation
> maxima
> you are bumping into. If you are using 16, there should be no way you
> are
> having problems. Also, knowing how much physical memory is available
> per
> cpu, how it is shared, what the maximum physical memory really is, all
> are
> important if you push the limits. Also, it is possible there are issues
> with what else is running, though I just don't know enough about Tru64
> SMP's
> to tell you. I have had problems with ulimits on machines I did not
> have
> root privileges on, so depending on how your facility is set up, you may
> be
> having problems with propagating the ulimit setting to the other nodes
> you
> are running on. You would have to talk to a sysadmin on that one, but I
> would think that doing the ulimit in a .login or .cshrc (assuming csh)
> ought
> to work (but it actually didn't for me here on the UNC linux cluster,
> which
> is why I never changed the ifc build to actually use stack memory under
> Linux - not that I couldn't get it fixed, but that I didn't want to
> imagine
> hoards of users dealing with sysadmins to get the stacksize ulimit
> changed).
> I HAVE seen memory problems on Tru64 shared memory machines, but it was
> with
> 91K atoms and 32 processors, and in a sense it is a measure of the
> unreasonableness of a shared memory setup as you scale larger and larger
> (or
> at least using mpi on these systems, but the world is going to mpi
> because
> it is one standard everyone can get to run everywhere, and SMP systems
> ultimately bottleneck on cache coherence problems, so you eventually
> have to
> shift to a networked cluster paradigm anyway). So the salient points:
> 1) You should be able to run at least 16 processors on this problem. If
> you
> can't at least do this, then there is some sort of memory management
> issue
> at your facility. Actually, one other possibility could be that you
> have
> set the direct force calculation cutoff value very high. It defaults to
> 8,
> with a skin of 1 (the skin is a region of space from which atoms are put
> in
> the pairlist because they are apt to move into the cutoff in the next
> few
> simulation steps). If you increased these to an arbitrarily large
> number,
> it could take lots of memory. I expect this is not the case but I am
> trying
> to think of everything.
> 2) Talk to your sysadmin about ulimits and how much memory is really
> available.
> 3) Please send me all the information requested.
> Regards - Bob
>
> ----- Original Message -----
> From: "Mu Yuguang (Dr)" <YGMu.ntu.edu.sg>
> To: <amber.scripps.edu>
> Sent: Monday, October 20, 2003 11:46 PM
> Subject: RE: AMBER: memory SC45
>
>
> > Thanks Bill,Bob
> > My system is 57999 atoms with PME uing PMEMD. The system is Compac
> Tru64
> > SC45.
> >
> > FATAL global dynamic memory setup allocation error!
> >
> > I try a smaller system, 331 atoms, it works
> > With memory used printed in mdout file:
> >
> > | Dynamic Memory, Types Used:
> > | Reals 95863
> > | Integers 6957
> >
> > | Nonbonded Pairs Initial Allocation: 3472
> >
> > Alos I try
> >
> > unlimit memoryuse
> > no effects
> >
> > What is the Dynamic Memory, how it scales with the number of atoms,
> > Here 331 atoms ~ 100K
> > How about 57999 atoms ?
> >
> >
> > -----Original Message-----
> > From: Robert Duke [mailto:rduke.email.unc.edu]
> > Sent: Tuesday, October 21, 2003 11:17 AM
> > To: amber.scripps.edu
> > Subject: Re: AMBER: memory SC45
> >
> > Mu -
> > Are you talking pmemd here, or sander 6, or sander 7? How many atoms?
> > How
> > many cpu's? How many processors sharing memory? How much physical
> > memory
> > total? You can run a 90906 atom problem in about 79 MB on a single
> > processor, and since the pairlist is divided when running in parallel,
> > memory requirements growth will be less than linear in processor
> count.
> > Thus, about 25 processes would run in 2 GB on a shared memory machine
> > (rough
> > estimate). That is half the memoryuse listed. It is possible, but
> > unlikely, for weird things to happen with mpi buffering. Without
> > knowing
> > more about your problem size and memory configuration, it is not
> > possible to
> > determine if it is reasonable for you to be running out of memory.
> > Regards - Bob
> > ----- Original Message -----
> > From: "Mu Yuguang (Dr)" <YGMu.ntu.edu.sg>
> > To: <amber.scripps.edu>
> > Sent: Monday, October 20, 2003 10:39 PM
> > Subject: AMBER: memory SC45
> >
> >
> > > Dear all,
> > > Thank you very much for your help.
> > >
> > > Now I have been successful in
> > >
> > > prun -
> > >
> > > But when I ask for more memory, in case my program treats with more
> > > atoms,
> > >
> > > The program failed in allocate memory .
> > >
> > > I check my limit , it wrote as :
> > >
> > >
> > > cputime unlimited
> > >
> > > filesize unlimited
> > >
> > > datasize 4194304 kbytes
> > >
> > > stacksize 3465214 kbytes
> > >
> > > coredumpsize 0 kbytes
> > >
> > > memoryuse 4089768 kbytes
> > >
> > > vmemoryuse 4194304 kbytes
> > >
> > > descriptors 4096
> > >
> > > Could I ask for the system administrator for setting the memoryuse
> > and
> > > vmemoryuse to be unlimited ?
> > >
> > >
> > >
> >
> -----------------------------------------------------------------------
> > > The AMBER Mail Reflector
> > > To post, send mail to amber.scripps.edu
> > > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> > >
> > >
> >
> >
> >
> >
> -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> >
> >
> >
> >
> -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber.scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
> >
> >
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Oct 22 2003 - 03:53:01 PDT