Re: [AMBER] help with TIP4P and mpi pmemd

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 4 Dec 2009 16:26:41 -0500

Okay, could be a blowup, as Dave Case suggests. Set ntpr to 1 and look at
the per-step data. Run it in uniprocessor mode so you can get a reliable
stderr. I suspect it is something about your input, but that includes
prmtop/inpcrd, and I don't have those. It could be an as-yet undiscovered
problem in the darden initialization code (since it happens in both sander
and pmemd), but I think this is a lot less likely than some problem with the
model/run conditions, since tip4p water has been used pretty extensively by
at least two groups I know of. If you want to send all your inputs to me, I
will consider doing a bit of debugging (don't send to list, just to me).
Regards - Bob
----- Original Message -----
From: "Hashem Taha" <hashemt.gmail.com>
To: "AMBER Mailing List" <amber.ambermd.org>
Sent: Friday, December 04, 2009 4:01 PM
Subject: Re: [AMBER] help with TIP4P and mpi pmemd


> yes pmemd 10. And yes, all the bugfixes have been applied. Moreover, this
> problem also affects sander.
>
> On Thu, Dec 3, 2009 at 10:58 PM, Robert Duke <rduke.email.unc.edu> wrote:
>
>> Are we talking pmemd 10? If so, has bugfix 8 been applied?
>> Regards - Bob
>>
>> ----- Original Message ----- From: "Hashem Taha" <hashemt.gmail.com>
>> To: "AMBER Mailing List" <amber.ambermd.org>
>> Sent: Thursday, December 03, 2009 9:26 PM
>> Subject: Re: [AMBER] help with TIP4P and mpi pmemd
>>
>>
>>
>> Hi Bob,
>>>
>>> I have tried this tip4p system before with the same molecule, and it
>>> worked
>>> fine (using serial sander, parallel sander and parallel pmemd). The same
>>> exact input files were used in this case. There are no comment lines in
>>> the
>>> input file before &cntrl.
>>>
>>> I tried running the same job using a serial version of sander but I
>>> encountered the same problem. I've recompiled sander using gcc with
>>> debugging flags and this is what I get when I run sander in GDB:
>>>
>>> (gdb) run -O -i minwat.in -o minwat.out -p alpha_ara_ome_tip4p.top -c
>>> alpha_ara_ome_tip4p.crd -r minwat.rst -ref alpha_ara_ome_tip4p.crd
>>> Starting program: /home/john/amber10/bin/sander -O -i minwat.in -o
>>> minwat.out -p alpha_ara_ome_tip4p.top -c alpha_ara_ome_tip4p.crd -r
>>> minwat.rst -ref alpha_ara_ome_tip4p.crd
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x00000000004bb74e in nb_adjust_ ()
>>> (gdb) backtrace
>>> #0 0x00000000004bb74e in nb_adjust_ ()
>>> #1 0x00000000004bdd42 in ewald_force_ ()
>>> #2 0x00000000005f8259 in force_ ()
>>> #3 0x0000000000483797 in runmin_ ()
>>> #4 0x00000000004734e3 in sander () at _sander.f:1296
>>> #5 0x0000000000470124 in MAIN__ () at _multisander.f:291
>>> #6 0x0000000000a2c6ae in main ()
>>>
>>> I don't have much experience with gdb but from the looks of it the error
>>> is
>>> originating from nb_adjust().
>>>
>>> I've tried recompiling sander and pmemd with different MPI libraries
>>> (openmpi and mpich2) and no MPI, with and without MKL and using gfortran
>>> and
>>> ifort, all the these combinations resulted in a SIGSEGV fault error.
>>> Although, I only added the debug flags to the gfortran/no parallel
>>> version.
>>>
>>>
>>>
>>> On Thu, Dec 3, 2009 at 3:37 PM, Robert Duke <rduke.email.unc.edu> wrote:
>>>
>>> Have you done this (tip4p) before? Try your prmtop/inpcrd/mdin with
>>>> single
>>>> processor sander, then single processor pmemd, and then pmemd mpi. I
>>>> bet
>>>> you have setup problems, or pmemd build problems, but this will sort
>>>> that
>>>> out. I will let others expond on setting up an extra points simulation
>>>> if
>>>> that is the problem. As an aside, why did you modify the elec and vdw
>>>> screening parms for 1-4 interactions, scnb and scee. This is I believe
>>>> generally not recommended, but maybe you are doing something I don't
>>>> know
>>>> about... Also, do you really have two comment lines in front of
>>>> &cntrl?
>>>> I
>>>> have never tried that, maybe it is inconsequential but I don't know...
>>>> (because there are multiple reading passes, namelist i/o combined with
>>>> group
>>>> i/o, I would not do anything nonstandard. May work fine, but namelist
>>>> read
>>>> errors can be really obscure, especially in parallel - one reason to
>>>> switch
>>>> to a single processor test case if something wierd happens.
>>>> Regards - Bob Duke
>>>> ----- Original Message ----- From: "Hashem Taha" <hashemt.gmail.com>
>>>> To: <amber.ambermd.org>
>>>> Sent: Thursday, December 03, 2009 5:16 PM
>>>> Subject: [AMBER] help with TIP4P and mpi pmemd
>>>>
>>>>
>>>> I have a problem with trying to run some jobs using TIP4P water as the
>>>>
>>>>> solvent. I have tried running the same exact files with TIP3P water
>>>>> and
>>>>> the
>>>>> calculations started and completed perfectly. However, upon changing
>>>>> from
>>>>> TIP3P to TIP4P, my calculations would stop without reason. the file
>>>>> that
>>>>> I
>>>>> am trying to run is just a water minimization and it results in the
>>>>> following errors. The input file is also included below. The
>>>>> calculations
>>>>> start but after a few steps they come to a halt. Any help would be
>>>>> appreciated, and if you require further information please let me
>>>>> know...
>>>>>
>>>>> HT
>>>>>
>>>>> the errors are:
>>>>>
>>>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>>>>
>>>>>>
>>>>>> Image PC Routine Line
>>>>> Source
>>>>> pmemd 000000000048265A Unknown Unknown
>>>>> Unknown
>>>>> pmemd 00000000004777C3 Unknown Unknown
>>>>> Unknown
>>>>> pmemd 00000000004AA1D5 Unknown Unknown
>>>>> Unknown
>>>>> pmemd 00000000004CA1CE Unknown Unknown
>>>>> Unknown
>>>>> pmemd 000000000040744C Unknown Unknown
>>>>> Unknown
>>>>> libc.so.6 0000003F4D81D8B4 Unknown Unknown
>>>>> Unknown
>>>>> pmemd 0000000000407359 Unknown Unknown
>>>>> Unknown
>>>>> rank 7 in job 55 compute-0-8.local_45343 caused collective abort of
>>>>> all
>>>>> ranks
>>>>> exit status of rank 7: killed by signal 9
>>>>>
>>>>> the input file...
>>>>>
>>>>> Constant Volume Minimization
>>>>> # Control section
>>>>> &cntrl
>>>>> ntwx = 50, ntpr = 1, ntwr = 1,
>>>>> scnb = 1.0, scee = 1.0, nsnb = 25, dielc = 1, cut = 8.0,
>>>>> ntb = 1,
>>>>> maxcyc = 1000, ntmin = 0, dx0 = 0.01, drms = 0.0001,
>>>>> ntp = 0,
>>>>> ibelly = 0, ntr = 1,
>>>>> imin = 1,
>>>>> &end
>>>>> Group Input for restrained atoms
>>>>> 5.0
>>>>> RES 1 2
>>>>> END
>>>>> END
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Dec 04 2009 - 13:30:04 PST
Custom Search