Hi Ping,
The limits in Sander v10 are that nthreads < 128 and there must be more
residues than threads. For most periodic simulations there are lots of
waters so this does not present a problem since the residue count is large.
For implicit solvent simulations it can cause issues however. PMEMD has
slightly more relaxed restraints. There should not, as far as I know, be an
upper limit on the number of threads except that you need to have 10x more
atoms than processors (for implicit solvent) and more residues than
processors (I think) for explicit solvent.
Note though that in either case the code should not segfault, it should quit
with an appropriate error. Thus it would be helpful if you could post an
example (prmtop, inpcrd, mdin) that shows this error so that we can try to
reproduce it.
All the best
Ross
> -----Original Message-----
> From: amber-bounces.ambermd.org [mailto:amber-bounces.ambermd.org] On
> Behalf Of Yang, Ping
> Sent: Friday, June 19, 2009 5:21 PM
> To: amber.ambermd.org
> Subject: [AMBER] Hard limit in Amber10?
>
>
> Greetings,
>
> Is there any hard limit implemented in Amber10? The code was compiled
> using icc+mvapich+mkl and passed test successfully.
>
> For a job that submit to the computer, it runs fine and finishes
> happily
> when use 16 or 32 or 48 processors. However, once using 64 processors
> and beyond (more than 8 nodes), rank 0 got 'segmentation fault' and
> stops at the step of dividing atoms among processors while leaves the
> rest ranks hanging. This happens to both Sander.MPI and pmemd.
>
> Could this issue be related to the AmberTools which is a serial
> version?
> I tried to recompile the parallel version of AmberTools. However, the
> configure file ignores the option '-mpi' when both '-mpi icc' is
> provided. Did I miss something here? ( I list more information at
> the end of the email.) I'd appreciate your kind help and any insight
> on
> this issue.
>
> Thanks much,
>
> -Ping
>
>
> ****************
> The details for building code
> ****************
> intel/10.1.015
> mvapich/1.0.1-2533
> mkl/10.0.011
>
> ****************
> Below are the last two lines from the unsuccessful job.
> ****************
> "begin time read from input coords = 20.020 ps
> Number of triangulated 3-point waters found: 105855"
>
>
> ****************
> The corresponding error file contains:
> ****************
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
>
> libpthread.so.0 000000353C70C4F0 Unknown Unknown
> Unknown
> libc.so.6 000000353C0721E3 Unknown Unknown
> Unknown
> sander.MPI 00000000009919AC Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967E65FE Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967BF582 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967BDC3A Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967B2BBC Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967CFCFE Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967A5F79 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967A3730 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A9677B5C0 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A9677B773 Unknown Unknown
> Unknown
> sander.MPI 00000000005234CA Unknown Unknown
> Unknown
> sander.MPI 00000000004CDA96 Unknown Unknown
> Unknown
> sander.MPI 00000000004C9334 Unknown Unknown
> Unknown
> sander.MPI 000000000041EE22 Unknown Unknown
> Unknown
> libc.so.6 000000353C01C3FB Unknown Unknown
> Unknown
> sander.MPI 000000000041ED6A Unknown Unknown
> Unknown
> srun: error: cu04n81: task0: Exited with exit code 174
> srun: Warning: first task terminated 60s ago
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
>
> libpthread.so.0 000000353C70C4F0 Unknown Unknown
> Unknown
> libc.so.6 000000353C0721E3 Unknown Unknown
> Unknown
> sander.MPI 00000000009919AC Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967E65FE Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967BF582 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967BDC3A Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967B2BBC Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967CFCFE Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967A5F79 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A967A3730 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A9677B5C0 Unknown Unknown
> Unknown
> libmpich.so.1.0 0000002A9677B773 Unknown Unknown
> Unknown
> sander.MPI 00000000005234CA Unknown Unknown
> Unknown
> sander.MPI 00000000004CDA96 Unknown Unknown
> Unknown
> sander.MPI 00000000004C9334 Unknown Unknown
> Unknown
> sander.MPI 000000000041EE22 Unknown Unknown
> Unknown
> libc.so.6 000000353C01C3FB Unknown Unknown
> Unknown
> sander.MPI 000000000041ED6A Unknown Unknown
> Unknown
> srun: error: cu02n104: task0: Exited with exit code 174
> srun: Warning: first task terminated 60s ago
>
> **************
> **************
>
>
>
>
> __________________________________________________
> Ping Yang
> EMSL, Molecular Science Computing
> Pacific Northwest National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, MSIN K8-83
> Richland, WA 99352 USA
> Tel: 509-371-6405
> Fax: 509-371-6110
> ping.yang.pnl.gov
> www.emsl.pnl.gov
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 06 2009 - 10:16:08 PDT