Re: [AMBER] Problems with GTX titan-X GPU

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 24 May 2016 07:40:30 -0700

Hi Elvis,

Yeap it all looks good. Doesn't look like there is anything wrong with your card. My suspicion therefore is that you have the card in Exclusive Process Mode where only one thing is allowed to run at once. This is good from a performance standpoint but I'm guessing you might be running X windows on this card and or loading other programs that use it while AMBER is running - e.g. vmd. This would result in processes being killed.

Try setting it to default mode. As root run:

nvidia-smi -c 0

And then see if the problems go away. If it keeps crashing it will be necessary to look more closely at what you are simulating and if the error message is getting lost in a nohup.out file or something similar. The remaining option is that whatever simulation you are running is not stable for some reason. The code would likely crash at some point with a CUDA launch error which would go to standard error - you would not see any error in the mdout file.

Let me know if the problem persists.

All the best
Ross

> On May 24, 2016, at 05:20, Elvis Martis <elvis.martis.bcp.edu.in> wrote:
>
> Hello Ross,
> I have finished running the validation suite, and attached the necessary
> log files. Please let me know if there is anything wrong. However, compared
> to the README, all energy values are identical for my card.
> Thanks once again.
>
> ​Regards
>
> *Elvis Martis* PhD Student, Bombay College of Pharmacy
> [image: photo] Website: http://www.elvismartis.in
> <http://www.elvismartis.in/>
> Address: Kalina, Santa Cruz [E]
>
> <https://in.linkedin.com/in/elvisadrianmartis>
>
>
>> Hi Elvis,
>>
>> Please try running the following validation suite:
>>
>> https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test.tar.gz
>>
>> untar it and edit run_test.x and set gpu_count to the number of GPUs in
>> your system - likely 1 I am guessing. Change test_count to 10 and
>> run_large_test to true. Then do:
>>
>> nohup ./run_test.x >& run_test.log &
>>
>> It will take around 24 hours to run - you can check if it is still running
>> with nvidia-smi. After it is complete please send the contents of the
>> GPU_0.log, GPU.large_0.log and run_test.log to the list and we should be
>> able to help you debug things and determine if it is a bad GPU or something
>> else.
>>
>> All the best
>> Ross
>>
>>> On May 21, 2016, at 03:19, Elvis Martis <elvis.martis.bcp.edu.in> wrote:
>>>
>>> Hello,
>>> I have a strange problem running my simulations using Amber14 on GTX
>>> Titan-X. Jobs run for several minutes and then die without any error
>>> message, hence I am unable to figure out what is the problem. I keep
>>> monitoring the GPU temperature while a job is running and it never
>> crosses
>>> 82C.
>>> This GPU card was obtained as gift from NVIDIA as seeding grant.
>>> Some information on my system.
>>> Amber version- Amber14 and AmberTools15 <all updates applied>
>>> CenOS release- centos-release-6-7.el6.centos.12.3.x86_64
>>> CUDA version- CUDA-7.5 (Driver version 352.79) <see attachments for
>>> complete system log generated using "nvidia-smi -q" nvidia.log and
>>> nvidia-smi.txt>
>>> CPU- Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz <see attached file cpu.log>
>>>
>>> Please let me know if any more information is needed to help me solve
>> this
>>> problem.
>>> Thanks in advance for your Help.
>>> Regards
>>>
>>> *Elvis Martis* PhD Student, Bombay College of Pharmacy
>>> [image: photo] Website: http://www.elvismartis.in
>>> <http://www.elvismartis.in/>
>>> Address: Kalina, Santa Cruz [E]
>>>
>>> <https://in.linkedin.com/in/elvisadrianmartis>
>>>
>> <nvidia-smi.txt><nvidia.log><cpu.log>_______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>>
> <GPU.large_0.log><GPU_0.log><run_test.log>_______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 24 2016 - 08:00:02 PDT
Custom Search