Re: [AMBER] pmemd.cuda segfaults

From: <pavel.banas.upol.cz>
Date: Tue, 11 Mar 2014 15:50:51 +0100 (CET)

Hi,
actually we have not compiled on that cluster any other programs except one,
which was just dummy program written in C as I wanted to know (at the
initial phase of the problem, when we had a problems also in CPU
compilations and there were errors related to malloc and free functions) if
there is a problem with these C-functions also in other programs. So I wrote
some dummy program that allocated some arrays with malloc, fill them and
deallocate the memory. And this program did not have any problems.

have a nice day, Pavel

-- 
Pavel Banáš
pavel.banas.upol.cz
Department of Physical Chemistry, 
Palacky University Olomouc 
Czech Republic 
---------- Původní zpráva ----------
Od: Ross Walker <ross.rosswalker.co.uk>
Komu: AMBER Mailing List <amber.ambermd.org>
Datum: 8. 3. 2014 19:00:42
Předmět: Re: [AMBER] pmemd.cuda segfaults
"Hi Pavel,
Well that's a completely new one for me. And also VERY worrying. I've
never used Core linux (or even heard of it until now) but this makes it
sound like some seriously Ghetto Linux Distro. Do other programs fail on
it or just AMBER? - Either way it would scare me to use something flakey
like that. Is it even on the list of supported Distro's for CUDA?
I'd be tempted to dump it completely (not sure I'd even trust the older
version) and switch to something more extensively tested. I like CentOS 6
- which is identical to RedHat EL6 but free. It also tends to be the most
supported in my experience.
All the best
Ross
On 3/8/14, 12:39 AM, "pavel.banas.upol.cz" <pavel.banas.upol.cz> wrote:
>Dear all,
>thank you for all your help and suggestions. Finallt we were able to
>solve 
>the problem by downgrading version of linux core on nodes.
>
>Using 3.11.8 core we were obtaining segfaults on both CPU and GPU caused
>most likely by memory leaks. After downgrade to 3.8.13 the problems of
>CPU 
>code were solved, but retain on GPU and after further downgrade to 3.2.55
>all segfaults disappear. So now we have healthy compilation even with
>intel 
>compilers and after tests we found that all cards are without any
>hardware 
>errors. 
>
>Bytheway, on recent linux core, all Ross's tests ended with segfautls,
>while
>now all of them pass and give the consistent energies.
>thank you very much,
>
>Pavel
>
>
>-- 
>Pavel Banáš
>pavel.banas.upol.cz
>Department of Physical Chemistry,
>Palacky University Olomouc
>Czech Republic 
>
>
>
>---------- Původní zpráva ----------
>Od: Tru Huynh <tru.pasteur.fr>
>Komu: AMBER Mailing List <amber.ambermd.org>
>Datum: 6. 3. 2014 20:28:16
>Předmět: Re: [AMBER] pmemd.cuda segfaults
>
>"On Wed, Mar 05, 2014 at 09:09:15PM +0100, pavel.banas.upol.cz wrote:
>> 
>> Dear all,
>> 
>...
>> 
>> Please, does anybody have the same architecture (GPU
>> SuperWorkstations 7047GR-TPRF with Super X9DRG-QF motherboards)?
>
>we have one of those running CentOS-5 x86_64
>dmidecode 
>Manufacturer: Supermicro
>Product Name: X9DRG-QF
>.
>+------------------------------------------------------+
>| NVIDIA-SMI 5.325.15 Driver Version: 325.15 |
>|-------------------------------+----------------------+------------------
>--
>--+
>| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
>| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
>|===============================+======================+==================
>==
>==|
>| 0 GeForce GTX TITAN Off | 0000:03:00.0 N/A | N/A |
>| 31% 43C N/A N/A / N/A | 107MB / 6143MB | N/A Default |
>+-------------------------------+----------------------+------------------
>--
>--+
>| 1 GeForce GTX TITAN Off | 0000:04:00.0 N/A | N/A |
>| 33% 47C N/A N/A / N/A | 87MB / 6143MB | N/A Default |
>+-------------------------------+----------------------+------------------
>--
>--+
>| 2 GeForce GTX TITAN Off | 0000:83:00.0 N/A | N/A |
>| 31% 44C N/A N/A / N/A | 87MB / 6143MB | N/A Default |
>+-------------------------------+----------------------+------------------
>--
>--+
>| 3 GeForce GTX TITAN Off | 0000:84:00.0 N/A | N/A |
>| 34% 50C N/A N/A / N/A | 87MB / 6143MB | N/A Default |
>+-------------------------------+----------------------+------------------
>--
>--+
>
>We had the 4 initial cards replaced (then one of the second batch),
>since then, no issue.
>
>Cheers,
>
>Tru
>-- 
>Dr Tru Huynh | http://www.pasteur.fr/recherche/unites/Binfs/
>mailto:tru.pasteur.fr | tel/fax +33 1 45 68 87 37/19
>Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber"
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber"
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Mar 11 2014 - 08:00:03 PDT
Custom Search