On Thu, 2013-11-14 at 15:35 +0100, Vlad Cojocaru wrote:
> Hi Jason,
>
> I applied the patch you sent me for deciphering the "failed when
> querying netcdf trajectory" error. However, the error file doesn't say
> anything ..
> This error is really weird ... It makes no sense that cpptraj does not
> read the netcdf ... The file is there and an a serial MMPBSA analysis on
> its first frame always works. Besides, even the parallel job runs
> sometimes properly (unfortunately rarely so that the crashes are
> disturbing) ...
>
> Best wishes
> Vlad
>
> --------- new error message after patching -----------------------
>
> OUTPUT:
>
>
> ERROR:
> None
>
> TrajError:
> /usr/users/vcojoca/apps/cluster_intel/amber/12_tools-13_intel-13.0_impi-4.1.0_patched/bin/cpptraj
> failed when querying oct4_sox2-k57e_cano.cdf
> Error occured on rank 0.
> Exiting. All files have been retained.
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> [8:gwdn064] unexpected disconnect completion event from [0:gwdn141]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
> internal ABORT - process 8
> [16:gwdn134] unexpected disconnect completion event from [0:gwdn141]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
> internal ABORT - process 16
> [32:gwdn158] unexpected disconnect completion event from [0:gwdn141]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
> internal ABORT - process 32
> [64:gwdn072] unexpected disconnect completion event from [0:gwdn141]
> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
> internal ABORT - process 64
So this appears to be an Intel MPI/Intel compiler issue. See this
thread: http://software.intel.com/en-us/forums/topic/329053
The OP here described an issue that was ultimately nailed down to a
stack overflow. In my experience, the Intel compilers are a bit more
aggressive about using the stack than the GNU compilers---I've seen
segfaults stemming from stack overflows using ifort that don't occur on
the exact same test using gfortran.
One thing that would be worth trying is to increase the size of the
stack, so put:
ulimit -s unlimited
before you run MMPBSA.py and see if this helps.
Good luck,
Jason
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 14 2013 - 08:00:03 PST