Hi Nicholus,
I have seen this problem before when running NEB or REMD on very high
processor counts. There is a massive all to all communication that occurs at
the very end of a run when calculating all the timings. This can be
excruciatingly slow for large > 512 core runs. But I doubt you are running
on that many threads here. How many threads are you running.
Normally you can just kill the job and all is good. I.e. your trajectory
file is complete as is your restart file. This doesn't help if you have the
job scripted but at least you can use the output.
You could also try compiling with -DNO_DETAILED_TIMINGS added to the
config.h file. This will turn off the printing of timings and then things
should run okay. If the problem goes away this at least narrows down where
it is occurring.
What are the specs of your system? Nodes, interconnect etc etc.
All the best
Ross
> -----Original Message-----
> From: nicholus bhattacharjee [mailto:nicholusbhattacharjee.gmail.com]
> Sent: Friday, October 01, 2010 9:42 AM
> To: AMBER Mailing List
> Subject: [AMBER] sander is running...
>
> Dear community,
> I am running some 7 ns simulation of protein in
a
> remote cluster. Some jobs have come out with as usual output file (bellow
> shown last few lines of output):
>
> **********************************************************
> **********************************************************
> ****************************************
> R M S F L U C T U A T I O N S
>
>
> NSTEP = 2000000 TIME(PS) = 7000.000 TEMP(K) = 5.48 PRESS =
> 0.0
> Etot = 110.4493 EKtot = 27.4115 EPtot =
> 108.9311
> BOND = 17.1038 ANGLE = 25.4147 DIHED =
> 18.7004
> 1-4 NB = 10.4789 1-4 EEL = 68.9331 VDWAALS =
> 22.4115
> EELEC = 376.2388 EGB = 257.4596 RESTRAINT =
> 0.0000
>
----------------------------------------------------------------------------
--
>
>
>
----------------------------------------------------------------------------
----
> 5. TIMINGS
>
----------------------------------------------------------------------------
----
>
> |>>>>>>>>PROFILE of Average TIMES>>>>>>>>>
> | Calc gb radii 16189.86 (16.22% of Gen B)
> | Communicate gb radii 9742.64 ( 9.76% of Gen B)
> | Calc gb diag 30491.24 (30.55% of Gen B)
> | Calc gb off-diag 43341.16 (43.43% of Gen B)
> | Other 36.38 ( 0.04% of Gen B)
> | Gen Born time 99801.28 (100.0% of Nonbo)
> | Nonbond force 99805.12 (93.87% of Force)
> | Bond/Angle/Dihedral 567.36 ( 0.53% of Force)
> | FRC Collect time 3246.43 ( 3.05% of Force)
> | Other 2705.11 ( 2.54% of Force)
> | Force time 106324.02 (96.79% of Runmd)
> | Shake time 142.10 ( 0.13% of Runmd)
> | Verlet update time 1312.72 ( 1.20% of Runmd)
> | CRD distribute time 2024.34 ( 1.84% of Runmd)
> | Other 46.02 ( 0.04% of Runmd)
> | Runmd Time 109849.19 (100.0% of Total)
> | Total time 109849.61 (100.0% of ALL )
>
> | Number of list builds : 0
>
> | Highest rstack allocated: 0
> | Highest istack allocated: 0
> | Job began at 12:25:01.582 on 09/30/2010
> | Setup done at 12:25:01.995 on 09/30/2010
> | Run done at 18:55:51.337 on 10/01/2010
> | wallclock() was called70000034 times
>
> **********************************************************
> **********************************************************
> *************************************
>
> but some jobs are not getting terminated and sander is still running. last
> few lines of output files are
>
> **********************************************************
> **********************************************************
> *************************************
> KE Trans = 0.0000 KE Rot = 0.0000 C.O.M. Vel = 0.000000
>
> NSTEP = 2000000 TIME(PS) = 7000.000 TEMP(K) = 315.15 PRESS =
> 0.0
> Etot = -1560.7726 EKtot = 1575.9914 EPtot =
> -3136.7640
> BOND = 453.3646 ANGLE = 1121.0840 DIHED =
1353.4213
> 1-4 NB = 429.5338 1-4 EEL = 6447.2781 VDWAALS =
-660.5796
> EELEC = -9440.4416 EGB = -2840.4245 RESTRAINT =
> 0.0000
>
----------------------------------------------------------------------------
--
>
>
> A V E R A G E S O V E R 2000000 S T E P S
>
>
> NSTEP = 2000000 TIME(PS) = 7000.000 TEMP(K) = 313.91 PRESS =
> 0.0
> Etot = -1365.2274 EKtot = 1569.7995 EPtot =
> -2935.0268
> BOND = 401.0967 ANGLE = 1146.3664 DIHED =
1336.4671
> 1-4 NB = 442.5130 1-4 EEL = 6372.6617 VDWAALS =
-667.6084
> EELEC = -8967.0383 EGB = -2999.4850 RESTRAINT =
> 0.0000
>
----------------------------------------------------------------------------
--
>
>
> R M S F L U C T U A T I O N S
>
>
> NSTEP = 2000000 TIME(PS) = 7000.000 TEMP(K) = 5.35 PRESS =
> 0.0
> Etot = 119.4705 EKtot = 26.7405 EPtot =
> 117.8487
> BOND = 16.8868 ANGLE = 25.1935 DIHED =
> 18.0292
> 1-4 NB = 10.0793 1-4 EEL = 55.0170 VDWAALS =
> 28.5748
> EELEC = 329.2162 EGB = 208.3778 RESTRAINT =
> 0.0000
>
----------------------------------------------------------------------------
--
>
> **********************************************************
> **********************************************************
> **************************************
>
> the output file is stuck there and sander is showing running...i am not
able
> to understand the problem should i terminate the job as 7 ns of simulation
> is complete? thank you for the reply.
>
> --
> Nicholus Bhattacharjee
> PhD Scholar
> Department of Chemistry
> University of Delhi
> Delhi-110007 (INDIA)
> Phone: 9873098743(M)
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 01 2010 - 12:00:04 PDT