Re: [AMBER] problem while running amber in parallel (wall time ?) from Jason Swails on 2010-01-29 (Amber Archive Jan 2010)

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 29 Jan 2010 15:06:28 -0500

Perhaps add a "cd $PBS_O_WORKDIR" to make sure the calculation is
working in the same directory that the calculation was submitted from?
It is unclear to me where this calculation would be performed
(perhaps in the root directory?) without a "cd' statement.

Moreover, the way you set your "path" variable isn't doing anything as
far as I can tell (though admittedly my experience with csh variants
is not extensive). First of all, you need to set environment
variables with "setenv" rather than "set". Also, the path environment
variable that includes directories to be searched for executables is
PATH (unices are typically case-sensitive except for Mac OS X). That
said, only directories should be added to PATH (not mpirun).

On Fri, Jan 29, 2010 at 2:05 PM, imtiaz shafiq <imtiazshafiq.gmail.com> wrote:
> Dear All,
>
> Have a nice day, I was able to run amber successfully in parallel few
> days before using this qsub scrip
>
> #!/bin/tcsh
> # This is file Run__amber
>
> #PBS -l nodes=1:ppn=4
> #PBS -V
> #PBS -N Imtiaz
> #PBS -l walltime=00:59:00
>
> set path = (/home/imtiaz/amber10/src/lam-7.1.3
> /opt/lam_install_intel/bin/mpirun $path .)

It looks like you may be confusing the lam included with amber10 and a
pre-existing installation of lam/mpi already on your cluster. If a
previous installation of lam/mpi exists on your system already, then
it is unnecessary to compile the version included with amber. You
just have to make sure that you use the mpi compiler wrappers
(mpif90/mpicc, etc) included with the mpi you plan to use for your
simulations to compile amber10.

>
> echo "simulation started at" `date`
>
> /opt/lam/7.1.3/bin/lamboot
>
> /opt/lam/7.1.3/bin/mpirun -ssi rpi lamd N sander.MPI -O -i heat.in -o
> heat.out -p ras-raf_solvated.prmtop -c min.rst -r
> heat.rst -x heat.mdcrd -ref min.rst
>

perhaps you should have a "/opt/lam/7.1.3/bin/lamhalt" here as well?
It's probably not required, but will help to clean up any rogue
threads after a failed MPI run.

> echo "simulation ended at" `date`
>
>
> I am not sure now what happened with the same script when I submit a
> qsub job, job is submitted but with no output and no error, even showq
> does not show any job running. Our cluster admin is saying that it is
> something related to Amber not to the cluster software and hardware

If it was a problem related to amber, something would have been
printed to stderr (which would have been the Imtiaz.e413948 file), and
hopefully also an error message printed to the mdout file as well.
The fact that no files were created means it's probably something in
your submission script and not amber (though add a cd command so you
know where files are being created).

>
> " ran a job for Imtiaz on cluster1 - came back with a walltime error -
> that's not a system software/hardware problem."
>
> Please suggest something in this regards
>
> Is there some problem in my qsub script? if yes this was working fine
> before as such?
>
> What could be potential problem with wall time?
>
> Here is an example screenshot
>
> [imtiaz.cluster1 amber10]$ qsub -d /home/mis9/amber_cdk2/ 1fin-min -V
> 413948.cluster1
> [imtiaz.cluster1 amber10]$
> [imtiaz.cluster1 amber10]$ more Imtiaz.o413948
> [imtiaz.cluster1 amber10]$ more Imtiaz.e413948
> [imtiaz.cluster1 amber10]$
>
> * both the qsub o and e files related to the job id 413948 are empty

This means nothing was written to either standard output or standard
error (which eliminates any debugging information that could be
gathered from those files).

Good luck!
Jason

-- 
---------------------------------------
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jan 29 2010 - 12:30:03 PST