Re: [AMBER] MMPBSA errors: "failed with prmtop" and "failed when querying netcdf trajectory"

From: Vlad Cojocaru <vlad.cojocaru.mpi-muenster.mpg.de>
Date: Thu, 14 Nov 2013 17:49:48 +0100

Well, it is set by default to unlimited on several nodes I tested ... I
would think it is all over the cluster ... But yes I can put it in my
jobs script ..

Maybe I can try to compile with openMPI ....

Best,
Vlad


On 11/14/2013 04:53 PM, Jason Swails wrote:
> On Thu, 2013-11-14 at 15:35 +0100, Vlad Cojocaru wrote:
>> Hi Jason,
>>
>> I applied the patch you sent me for deciphering the "failed when
>> querying netcdf trajectory" error. However, the error file doesn't say
>> anything ..
>> This error is really weird ... It makes no sense that cpptraj does not
>> read the netcdf ... The file is there and an a serial MMPBSA analysis on
>> its first frame always works. Besides, even the parallel job runs
>> sometimes properly (unfortunately rarely so that the crashes are
>> disturbing) ...
>>
>> Best wishes
>> Vlad
>>
>> --------- new error message after patching -----------------------
>>
>> OUTPUT:
>>
>>
>> ERROR:
>> None
>>
>> TrajError:
>> /usr/users/vcojoca/apps/cluster_intel/amber/12_tools-13_intel-13.0_impi-4.1.0_patched/bin/cpptraj
>> failed when querying oct4_sox2-k57e_cano.cdf
>> Error occured on rank 0.
>> Exiting. All files have been retained.
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> [8:gwdn064] unexpected disconnect completion event from [0:gwdn141]
>> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>> internal ABORT - process 8
>> [16:gwdn134] unexpected disconnect completion event from [0:gwdn141]
>> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>> internal ABORT - process 16
>> [32:gwdn158] unexpected disconnect completion event from [0:gwdn141]
>> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>> internal ABORT - process 32
>> [64:gwdn072] unexpected disconnect completion event from [0:gwdn141]
>> Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
>> internal ABORT - process 64
> So this appears to be an Intel MPI/Intel compiler issue. See this
> thread: http://software.intel.com/en-us/forums/topic/329053
>
> The OP here described an issue that was ultimately nailed down to a
> stack overflow. In my experience, the Intel compilers are a bit more
> aggressive about using the stack than the GNU compilers---I've seen
> segfaults stemming from stack overflows using ifort that don't occur on
> the exact same test using gfortran.
>
> One thing that would be worth trying is to increase the size of the
> stack, so put:
>
> ulimit -s unlimited
>
> before you run MMPBSA.py and see if this helps.
>
> Good luck,
> Jason
>

-- 
Dr. Vlad Cojocaru
Max Planck Institute for Molecular Biomedicine
Department of Cell and Developmental Biology
Röntgenstrasse 20, 48149 Münster, Germany
Tel: +49-251-70365-324; Fax: +49-251-70365-399
Email: vlad.cojocaru[at]mpi-muenster.mpg.de
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 14 2013 - 09:00:35 PST
Custom Search