Thanks for your reply. I am using only 2 replicas, and only one processor for
each replica, since it is easier to optimize the method first with only
2 ones.
When it works, I will add more, of course.
I have tried your suggestion, the multisander job works fine with the same
restart and topology but DIFFERENT inputs (those for a standar molecular
dynamics). So do you think it could be a problem with the inputs? I am using
those that work for the tests, these ones:
rem.in.001:
Title Line
&cntrl
imin = 0, nstlim = 100, dt = 0.002,
ntx = 5, tempi = 0.0, temp0 = 325.0,
ntt = 3, tol = 0.000001, gamma_ln = 1.0,
ntc = 2, ntf = 1, ntb = 0,
ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
scee = 1.2, cut = 99.0,
ntr = 0, tautp = 0.1, offset = 0.09,
nscm = 500, igb = 5, irest=1,
ntave = 0, numexchg=5,
&end
rem.in.002
Title Line
&cntrl
imin = 0, nstlim = 100, dt = 0.002,
ntx = 5, tempi = 0.0, temp0 = 350.0,
ntt = 3, tol = 0.000001, gamma_ln = 1.0,
ntc = 2, ntf = 1, ntb = 0,
ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
scee = 1.2, cut = 99.0,
ntr = 0, tautp = 0.1, offset = 0.09,
nscm = 500, igb = 5, irest=1,
ntave = 0, numexchg=5,
&end
As groupfile I use:
#
#
-O -rem 1 -remlog rem.log -i ./rem.in.001 -p ./1ftg_wat.top -c
./md_prod_5.r -o
./rem.out.001 -inf reminfo.001 -r ./rem.r.001
-O -rem 1 -remlog rem.log -i ./rem.in.002 -p ./1ftg_wat.top -c
./md_prod_5.r -o
./rem.out.002 -inf reminfo.002 -r ./rem.r.002
And the script for executing the calculation is:
#!/bin/bash
# . class = bsc_ls
# . job_name = test_parallel
# . initialdir = .
# . output = OUTPUT/mpi_%j.out
# . error = OUTPUT/mpi_%j.err
# . total_tasks = 2
# . wall_clock_limit = 00:01:00
export XLFRTEOPTS="namelist=old:xrf_messages=no"
srun /gpfs/apps/AMBER/src/9/exe/sander.MPI -O -ng 2 -groupfile groupfile <
/dev/null
As I told you the restart and topology work well for a multisander job, with
standar molecular dynamics. When I try to execute this inputs for Replica
Exchange calculations, it only generates the EMPTY files rem.out.001 and
rem.out.002 and I get this error in the error file:
[0] MPI Abort by user Aborting program !
[0] Aborting program!
[1] MPI Abort by user Aborting program !
[1] Aborting program!
srun: error: s26c2b12: task[0-1]: Exited with exit code 255
The output file gives:
Running multisander version of sander amber9
Total processors = 2
Number of groups = 2
Looping over processors:
WorldRank is the global PE rank
NodeID is the local PE rank in current group
Group = 0
WorldRank = 0
NodeID = 0
Group = 1
WorldRank = 1
NodeID = 0
Any idea? Something wrong with the inputs?
Rebeca García Fandiño Ph. D.
Parc Cientific de Barcelona
Barcelona Spain
rebeca.mmb.pcb.ub.es
Quoting Carlos Simmerling <carlos.simmerling.gmail.com>:
> the thing to try first is 1 processor per group. this way you
> know that output from shake errors etc will get written to the
> output file, which only the master process for each replica can do.
> this is the same situation in normal MD- if there is a problem with no
> error msg in the output always try to run single processor to test it.
> you should not need anything special in the restart file from sander,
> it can be used directly for remd. it's hard to help more since you haven't
> told us much of anything about how you are doing the calculation.
>
> are you using only 2 replicas?
>
> does the same multisander job work fine if you just turn remd off (but
> otherwise use exactly the same input files)?
>
> On Thu, Feb 28, 2008 at 7:29 AM, <rebeca.mmb.pcb.ub.es> wrote:
>> Hello,
>> I am trying to do Replica Exchange calculations using Amber 9. When
>> I try with
>> the files of the example of the tests, it works, but when I try
>> with my protein
>> I have problems. Using directly the usual restart file from a
>> sander calculation
>> I get problems of the type
>>
>> [1] MPI Abort by user Aborting program !
>> [1] Aborting program!
>> [0] MPI Abort by user Aborting program !
>> [0] Aborting program!
>> srun: error: s30c1b04: task[0-1]: Exited with exit code 255
>>
>> However, when I create the restart file from the trajectory file
>> with ptraj the
>> calculation stops with no errors, but stop writting at the point (in the
>> rem.out files):
>>
>> ...................
>> trajectory generated by ptraj
>> begin time read from input coords = 0.000 ps
>>
>> Number of triangulated 3-point waters found: 0
>> | Atom division among processors:
>> | 0 2573
>> | Running AMBER/MPI version on 1 nodes
>>
>> | MULTISANDER: 2 groups. 1 processors out of 2 total.
>> ....................
>>
>> It creates the correspondent files reminfo and rem.log, but they
>> are all empty.
>> In the error file I only can see "srun: Force Terminated job".
>>
>> Since the same calculation works with the protein that appears in the test
>> examples, maybe could it be a problem of format? Should I do any special
>> treatment to the restart file I use for the calculations?
>>
>> Thank you very much for you help, in advance.
>>
>> Rebeca García Fandiño Ph. D.
>> Parc Cientific de Barcelona
>> Barcelona Spain
>> rebeca.mmb.pcb.ub.es
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sun Mar 02 2008 - 06:07:25 PST