Re: [AMBER] sander is running...

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 1 Oct 2010 11:55:09 -0700

Hi Nicholus,

I have seen this problem before when running NEB or REMD on very high
processor counts. There is a massive all to all communication that occurs at
the very end of a run when calculating all the timings. This can be
excruciatingly slow for large > 512 core runs. But I doubt you are running
on that many threads here. How many threads are you running.

Normally you can just kill the job and all is good. I.e. your trajectory
file is complete as is your restart file. This doesn't help if you have the
job scripted but at least you can use the output.

You could also try compiling with -DNO_DETAILED_TIMINGS added to the
config.h file. This will turn off the printing of timings and then things
should run okay. If the problem goes away this at least narrows down where
it is occurring.

What are the specs of your system? Nodes, interconnect etc etc.

All the best
Ross

> -----Original Message-----
> From: nicholus bhattacharjee [mailto:nicholusbhattacharjee.gmail.com]
> Sent: Friday, October 01, 2010 9:42 AM
> To: AMBER Mailing List
> Subject: [AMBER] sander is running...
>
> Dear community,
> I am running some 7 ns simulation of protein in
a
> remote cluster. Some jobs have come out with as usual output file (bellow
> shown last few lines of output):
>
> **********************************************************
> **********************************************************
> ****************************************
> R M S F L U C T U A T I O N S
>
>
> NSTEP = 2000000 TIME(PS) = 7000.000 TEMP(K) = 5.48 PRESS =
> 0.0
> Etot = 110.4493 EKtot = 27.4115 EPtot =
> 108.9311
> BOND = 17.1038 ANGLE = 25.4147 DIHED =
> 18.7004
> 1-4 NB = 10.4789 1-4 EEL = 68.9331 VDWAALS =
> 22.4115
> EELEC = 376.2388 EGB = 257.4596 RESTRAINT =
> 0.0000
>
----------------------------------------------------------------------------
--
> 
> 
>
----------------------------------------------------------------------------
----
>    5.  TIMINGS
>
----------------------------------------------------------------------------
----
> 
> |>>>>>>>>PROFILE of Average TIMES>>>>>>>>>
> |                Calc gb radii          16189.86 (16.22% of Gen B)
> |                Communicate gb radii    9742.64 ( 9.76% of Gen B)
> |                Calc gb diag           30491.24 (30.55% of Gen B)
> |                Calc gb off-diag    43341.16 (43.43% of Gen B)
> |                Other                     36.38 ( 0.04% of Gen B)
> |             Gen Born time          99801.28 (100.0% of Nonbo)
> |          Nonbond force          99805.12 (93.87% of Force)
> |          Bond/Angle/Dihedral      567.36 ( 0.53% of Force)
> |          FRC Collect time        3246.43 ( 3.05% of Force)
> |          Other                   2705.11 ( 2.54% of Force)
> |       Force time            106324.02 (96.79% of Runmd)
> |       Shake time               142.10 ( 0.13% of Runmd)
> |       Verlet update time      1312.72 ( 1.20% of Runmd)
> |       CRD distribute time     2024.34 ( 1.84% of Runmd)
> |    Other                     46.02 ( 0.04% of Runmd)
> |    Runmd Time            109849.19 (100.0% of Total)
> | Total time            109849.61 (100.0% of ALL  )
> 
> | Number of list builds   :          0
> 
> | Highest rstack allocated:          0
> | Highest istack allocated:          0
> |           Job began  at 12:25:01.582  on 09/30/2010
> |           Setup done at 12:25:01.995  on 09/30/2010
> |           Run   done at 18:55:51.337  on 10/01/2010
> |     wallclock() was called70000034 times
> 
> **********************************************************
> **********************************************************
> *************************************
> 
> but some jobs are not getting terminated and sander is still running. last
> few lines of output files are
> 
> **********************************************************
> **********************************************************
> *************************************
>    KE Trans =     0.0000   KE Rot =     0.0000   C.O.M. Vel =    0.000000
> 
>  NSTEP =  2000000   TIME(PS) =    7000.000  TEMP(K) =   315.15  PRESS =
> 0.0
>  Etot   =     -1560.7726  EKtot   =     1575.9914  EPtot      =
> -3136.7640
>  BOND   =    453.3646  ANGLE   =     1121.0840  DIHED      =
1353.4213
>  1-4 NB =    429.5338  1-4 EEL =     6447.2781  VDWAALS    =
-660.5796
>  EELEC  =     -9440.4416  EGB     =     -2840.4245  RESTRAINT  =
> 0.0000
>
----------------------------------------------------------------------------
--
> 
> 
>       A V E R A G E S   O V E R 2000000 S T E P S
> 
> 
>  NSTEP =  2000000   TIME(PS) =    7000.000  TEMP(K) =   313.91  PRESS =
> 0.0
>  Etot   =     -1365.2274  EKtot   =     1569.7995  EPtot      =
> -2935.0268
>  BOND   =    401.0967  ANGLE   =     1146.3664  DIHED      =
1336.4671
>  1-4 NB =    442.5130  1-4 EEL =     6372.6617  VDWAALS    =
-667.6084
>  EELEC  =     -8967.0383  EGB     =     -2999.4850  RESTRAINT  =
> 0.0000
>
----------------------------------------------------------------------------
--
> 
> 
>       R M S  F L U C T U A T I O N S
> 
> 
>  NSTEP =  2000000   TIME(PS) =    7000.000  TEMP(K) =     5.35  PRESS =
> 0.0
>  Etot   =       119.4705  EKtot   =        26.7405  EPtot      =
> 117.8487
>  BOND   =        16.8868  ANGLE   =        25.1935  DIHED      =
> 18.0292
>  1-4 NB =        10.0793  1-4 EEL =        55.0170  VDWAALS    =
> 28.5748
>  EELEC  =       329.2162  EGB     =       208.3778  RESTRAINT  =
> 0.0000
>
----------------------------------------------------------------------------
--
> 
> **********************************************************
> **********************************************************
> **************************************
> 
> the output file is stuck there and sander is showing running...i am not
able
> to understand the problem should i terminate the job as 7 ns of simulation
> is complete? thank you for the reply.
> 
> --
> Nicholus Bhattacharjee
> PhD Scholar
> Department of Chemistry
> University of Delhi
> Delhi-110007 (INDIA)
> Phone: 9873098743(M)
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 01 2010 - 12:00:04 PDT
Custom Search