Re: [AMBER] problem while running amber in parallel (wall time ?) from Jason Swails on 2010-01-29 (Amber Archive Jan 2010)

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 29 Jan 2010 15:25:54 -0500

If no files are being created and nothing is written to the stdout and
stderr files, it seems impossible to me to diagnose the problem (and
furthermore suggests problems unrelated to amber). At the very least,
you have a print statement, "echo "simulation started at" `date`" that
you put before any call to any amber program. If the calculation made
it even this far into the submission script, then you would see
"simulation started at _____________" in the output file. The fact
that this does not appear anywhere means the script is not making it
even this far, and sander.MPI is not given a chance to fail.

I hope you figure out how to fix this issue.

Good luck!
Jason

On Fri, Jan 29, 2010 at 3:17 PM, MUHAMMAD IMTIA SHAFIQ
<imtiazshafiq.gmail.com> wrote:
> Dear Jason,
>
> Thanks for your reply,
>
> I was submitting my job like this
>
> qsub -d /home/imtiaz/amber10/ script -V
>
> here scrip is a file containing the job information as I mentioned in my last email. It was working fine until few days before. amber10 is my working directory and i was submitting job with the above command while in the amber10 directory. Now it is not working as I mentioned in my last email
>
> on the cluster $AMBERHOME is already set to directory where Amber10 is installed.
>
>
> Regards
> Imtiaz
>
>
>
>
> On 29 Jan 2010, at 20:06, Jason Swails wrote:
>
>> Perhaps add a "cd $PBS_O_WORKDIR" to make sure the calculation is
>> working in the same directory that the calculation was submitted from?
>> It is unclear to me where this calculation would be performed
>> (perhaps in the root directory?) without a "cd' statement.
>>
>> Moreover, the way you set your "path" variable isn't doing anything as
>> far as I can tell (though admittedly my experience with csh variants
>> is not extensive). First of all, you need to set environment
>> variables with "setenv" rather than "set". Also, the path environment
>> variable that includes directories to be searched for executables is
>> PATH (unices are typically case-sensitive except for Mac OS X). That
>> said, only directories should be added to PATH (not mpirun).
>>
>> On Fri, Jan 29, 2010 at 2:05 PM, imtiaz shafiq <imtiazshafiq.gmail.com> wrote:
>>> Dear All,
>>>
>>> Have a nice day, I was able to run amber successfully in parallel few
>>> days before using this qsub scrip
>>>
>>> #!/bin/tcsh
>>> # This is file Run__amber
>>>
>>> #PBS -l nodes=1:ppn=4
>>> #PBS -V
>>> #PBS -N Imtiaz
>>> #PBS -l walltime=00:59:00
>>>
>>> set path = (/home/imtiaz/amber10/src/lam-7.1.3
>>> /opt/lam_install_intel/bin/mpirun $path .)
>>
>> It looks like you may be confusing the lam included with amber10 and a
>> pre-existing installation of lam/mpi already on your cluster. If a
>> previous installation of lam/mpi exists on your system already, then
>> it is unnecessary to compile the version included with amber. You
>> just have to make sure that you use the mpi compiler wrappers
>> (mpif90/mpicc, etc) included with the mpi you plan to use for your
>> simulations to compile amber10.
>>
>>>
>>> echo "simulation started at" `date`
>>>
>>> /opt/lam/7.1.3/bin/lamboot
>>>
>>> /opt/lam/7.1.3/bin/mpirun -ssi rpi lamd N sander.MPI -O -i heat.in -o
>>> heat.out -p ras-raf_solvated.prmtop -c min.rst -r
>>> heat.rst -x heat.mdcrd -ref min.rst
>>>
>>
>> perhaps you should have a "/opt/lam/7.1.3/bin/lamhalt" here as well?
>> It's probably not required, but will help to clean up any rogue
>> threads after a failed MPI run.
>>
>>> echo "simulation ended at" `date`
>>>
>>>
>>> I am not sure now what happened with the same script when I submit a
>>> qsub job, job is submitted but with no output and no error, even showq
>>> does not show any job running. Our cluster admin is saying that it is
>>> something related to Amber not to the cluster software and hardware
>>
>> If it was a problem related to amber, something would have been
>> printed to stderr (which would have been the Imtiaz.e413948 file), and
>> hopefully also an error message printed to the mdout file as well.
>> The fact that no files were created means it's probably something in
>> your submission script and not amber (though add a cd command so you
>> know where files are being created).
>>
>>>
>>> " ran a job for Imtiaz on cluster1 - came back with a walltime error -
>>> that's not a system software/hardware problem."
>>>
>>> Please suggest something in this regards
>>>
>>> Is there some problem in my qsub script? if yes this was working fine
>>> before as such?
>>>
>>> What could be potential problem with wall time?
>>>
>>> Here is an example screenshot
>>>
>>> [imtiaz.cluster1 amber10]$ qsub -d /home/mis9/amber_cdk2/ 1fin-min -V
>>> 413948.cluster1
>>> [imtiaz.cluster1 amber10]$
>>> [imtiaz.cluster1 amber10]$ more Imtiaz.o413948
>>> [imtiaz.cluster1 amber10]$ more Imtiaz.e413948
>>> [imtiaz.cluster1 amber10]$
>>>
>>> * both the qsub o and e files related to the job id 413948 are empty
>>
>> This means nothing was written to either standard output or standard
>> error (which eliminates any debugging information that could be
>> gathered from those files).
>>
>>
>> Good luck!
>> Jason
>> --
>> ---------------------------------------
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Graduate Student
>> 352-392-4032
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
---------------------------------------
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jan 29 2010 - 12:30:04 PST