Hi,
It appears that the problem is only with pmemd.cuda.MPI; is that
right? I have often encountered issues when using openmpi. Can you try
compiling with mpich2 and see if the problem disappears?
-Dan
On Tue, Aug 12, 2014 at 12:00 AM, Pablo Ródenas <pablo.rodenas.bsc.es> wrote:
> Good morning,
>
> after testing the installation, the error is still appearing in every
> CUDA execution of the user's input but not with the CPU version of
> Amber14 or previous versions of Amber with CUDA. Any hint about how to
> avoid it or what else can I check?
>
> Many thanks,
> Pablo.
>
>
> On 08/07/2014 04:54 PM, Pablo Ródenas wrote:
>> Thanks for your answer Jason,
>>
>> I ran the CUDA and CUDA parallel tests and I obtained the following
>> results depending on the case, but all failures were due to a small
>> differences in the numbers:
>> CUDA (installation was done with gcc/4.6.1, mkl/11.1 and cuda/5.0)
>> 90 passed, 35 failures (expected) and 0 errors
>> all tests done over driver 304.54 and 331.62 with the same results.
>>
>> CUDA PARALLEL (installation was done with gcc/4.6.1, mkl/11.1, cuda/5.0
>> and openmpi/1.7.3)
>> DO_PARALLEL=4 and driver 304.54 --> 50 passed, 37 failures and 0 errors
>> DO_PARALLEL=4 and driver 331.62 --> 47 passed, 40 failures and 0 errors
>> DO_PARALLEL=8 and driver 304.54 --> 23 passed, 61 failures and 10 errors
>> DO_PARALLEL=8 and driver 331.62 --> 22 passed, 62 failures and 9 errors
>>
>> I really appreciate your help.
>>
>> Best regards,
>> Pablo.
>>
>> On 07/28/2014 03:42 PM, Jason Swails wrote:
>>> On Mon, Jul 28, 2014 at 9:04 AM, Pablo Ródenas <pablo.rodenas.bsc.es> wrote:
>>>
>>>> Good afternoon,
>>>>
>>>> Ruben Perez (with Amber license Q2818013A) and us (as user support team)
>>>> are trying to execute pmemd.CUDA of Amber14 in our cluster and we got
>>>> always the error of the subject with an input that works fine with the
>>>> GPU version of pmemd in Amber12 and with the CPU version of pmemd in
>>>> Amber14.
>>>>
>>>> Our version of Amber14 has been firstly updated and then compiled with
>>>> different options, for example, our latest tests were using
>>>> OpenMPI/1.7.3, gcc/4.6.1 and CUDA/5.0 but we had the same error with
>>>> bullxmpi/1.1.11.1 + intel compilers 14.0.1 and/or CUDA6. The tests were
>>>> performed in 1 node with 1 and/or 2 GPUs with the same result.
>>>>
>>>> The node specs are 12 CPUs (Intel(R) Xeon(R) CPU E5649 @ 2.53GHz) and 2
>>>> GPUs (Tesla M2090), Driver Version 304.54. We have also tested with
>>>> driver version 331.62 with no luck.
>>>>
>>>> Please, do not hesitate to ask us for more information and thank you
>>>> very much in advance for your help.
>>>>
>>>>
>>> This is a new error that I've never seen before. Have you run the CUDA
>>> tests? Do the tests pass? As a note, a lot of the tests will fail with
>>> either simple roundoff differences or with other differences stemming from
>>> different random number sequences when stochastic thermostats (ntt=2 or
>>> ntt=3) are used. So apart from those (expected) failures, it's important
>>> to make sure that all of the other tests pass to make sure your
>>> installation works.
>>>
>>> HTH,
>>> Jason
>>>
>
> --
> Pablo Ródenas Barquero (pablo.rodenas.bsc.es)
> BSC - Centro Nacional de Supercomputación
> C/ Jordi Girona, 31 WWW: http://www.bsc.es
> 08034 Barcelona, Spain Tel: +34-93-405 42 29
> e-mail: support.bsc.es Fax: +34-93-413 77 21
> -----------------------------------------------
> CNAG - Centre Nacional Anàlisi Genòmica
> C/ Baldiri Reixac, 4 WWW: http://www.cnag.cat
> 08028 Barcelona, Spain Tel: +34-93-403 37 54
> e-mail: cnag_support.bsc.es
> -----------------------------------------------
>
>
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
>
> http://www.bsc.es/disclaimer
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
--
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 12 2014 - 11:00:02 PDT