Hi Ross,

The production file stays the same throughout the entire run since I'm wanting to run for a full microsecond, and no other jobs are being fired up on this machine as I'm the only one with access to it. I'm running this over a full microsecond continuously using this script:


setenv CUDA_HOME /usr/local/cuda-6.5
setenv LD_LIBRARY_PATH "/usr/local/cuda-6.5/lib64:/software/openmpi.1.8.1/lib:${LD_LIBRARY_PATH}"
setenv PATH "/usr/local/cuda-6.5/bin:${PATH}"

set prv=M

foreach cur (N O P Q R S T U V W X Y Z)

  /software/openmpi.1.8.1/bin/mpirun -v -np 2 pmemd.cuda.MPI -O -i -p ../protein.prmtop -c production.$prv.restrt -o production.$cur.out -r production.$cur.restrt -x production.$cur.mdcrd

  set prv=$cur

Hi Mohamed,
My first thought here was temperature throttling but when you say it always happens at the same point that hypothesis goes out the window. I've never seen this behavior before and am not even sure how to speculate on what might be causing it. First off given you say the performance is half are you certain it is not related to your input files in some way. Is there any difference with them - are you suddenly dropping the time step to 1fs from 2fs? Is anything else changed - do you change the ensemble or the barostat?
My guess is it has to be something related to your simulation settings rather than the machine or GPUs since it happens when you start the next simulation. The other possibility is somehow multiple runs are being fired up on the same GPU. E.g. I could envision forgetting to set CUDA_VISIBLE_DEVICES again after the first run on GPU 0 completes and so ending up with the second run that was supposed to go on GPU 0 ending up on GPU 1 where another job is already running. Look for things like this in your scripts / by watching with nvidia-smi etc.
> Hi all,
> I have two GTX Titan Xs paired with a i7 5930K . 3.5 GHz processor in an ASUS Rampage V motherboard with 16GB 2133 MHz DDR4 RAM. i'm running a relatively small ~15K atom system and doing normal MD simulation with dt=0.002. My production file setup saves files every 100ns. I get an average of 275ns/day on the system but for some reason the Titan Xs slow down to a mere 100ns/day after two runs or 200ns. This happens exactly after the 2nd run is completed. I have to stop the current job and start it up again to continue onward. The err.log is empty and the temperatures are not an issue, as I have the fans running at ~50% which keep both GPUs under 65c. Any suggestions?
