Re: [AMBER] error of amber16 mpi version using open-mpi

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 18 Oct 2016 09:34:53 -0700

Hi Jacky,

I compile AMBER 16 with cuda 8.0 and mpich 3.1.4 almost everyday. It works fine. I've never seen this behavior before, built AMBER 16 with centos 7, mpich-3.1.4, gnu 4.8.5 and cuda 8.0.44 a total of 11 times yesterday.

Hangup doesn't look like a compile error to me - more like someone doing a kill <pid> on the nvcc process. So this looks like something weird in your environment or some kind of interactive timeout limit (or maybe a memory limit). Are you running this on a local machine or on a cluster through some queueing system?

All the best
Ross

> On Oct 18, 2016, at 9:03 AM, jacky zhao <jackyzhao010.gmail.com> wrote:
>
> Sorry to bother you again.
>
> MPICH3.1.4 can be used for cpu-mpi version of amber16. However, the
> cuda-mpi version of amber16 has problem to compile. The error information
> are list below:
>
>
> nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are
> deprecated, and may be removed in a future release (Use
> -Wno-deprecated-gpu-targets to suppress warning).
>
> make[5]: *** [kCalculateGBNonbondEnergy1.o] Hangup
>
> make[1]: make: *** [cuda_parallel] Hangup*** [install] Hangup
>
>
> make[4]: *** [cuda_dpfp_libs] Hangup
>
> make[3]: *** [cuda_parallel_DPFP] Hangup
>
> make[2]: *** [cuda_parallel] Hangup
>
>
> 2016-10-18 22:47 GMT+08:00 Daniel Roe <daniel.r.roe.gmail.com>:
>
>> Hi,
>>
>> On Tue, Oct 18, 2016 at 10:24 AM, jacky zhao <jackyzhao010.gmail.com>
>> wrote:
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_char'
>>
>> These kinds of errors during the link step usually happen when you
>> change compiler types/versions and old libraries get leftover. You may
>> want to try running a 'make uninstall' and then re-running configure.
>>
>> -Dan
>>
>> PS - The patch will be ready to go soon.
>>
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `MPI_Comm_f2c'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_int'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_comm_null'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_op_max'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_op_lor'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_op_land'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_unsigned'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_double'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_unsigned_long'
>>>
>>> /home/user/amber16/lib/libfftw3_mpi.so: undefined reference to
>>> `ompi_mpi_op_sum'
>>>
>>> collect2: error: ld returned 1 exit status
>>>
>>> make[2]: *** [/home/user/amber16/bin/mdgx.MPI] Error 1
>>>
>>> make[2]: Leaving directory `/home/user/amber16/AmberTools/src/mdgx'
>>>
>>> make[1]: *** [parallel] Error 2
>>>
>>> make[1]: Leaving directory `/home/user/amber16/AmberTools/src'
>>>
>>> make: *** [install] Error 2
>>>
>>>
>>> 2016-10-18 22:02 GMT+08:00 jacky zhao <jackyzhao010.gmail.com>:
>>>
>>>> MPICH3.2:
>>>>
>>>>
>>>> mpicxx -Wall -Wno-unused-function -c -I/home/user/amber16/include -O3
>>>> -mtune=native -fPIC -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
>> -DBINTRAJ
>>>> -DHASGZ -DHASBZ2 -D__PLUMED_HAS_DLOPEN -DMPI
>>>> -I/home/user/amber16/include -DUSE_SANDERLIB -o FileIO_Gzip.o
>>>> FileIO_Gzip.cpp
>>>>
>>>> mpicxx -Wall -Wno-unused-function -c -I/home/user/amber16/include -O3
>>>> -mtune=native -fPIC -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
>> -DBINTRAJ
>>>> -DHASGZ -DHASBZ2 -D__PLUMED_HAS_DLOPEN -DMPI
>>>> -I/home/user/amber16/include -DUSE_SANDERLIB -o FileIO_Mpi.o
>>>> FileIO_Mpi.cpp
>>>>
>>>> FileIO_Mpi.cpp: In member function 'virtual int
>> FileIO_Mpi::Seek(off_t)':
>>>>
>>>> FileIO_Mpi.cpp:33:28: error: 'SEEK_SET' was not declared in this scope
>>>>
>>>> if (pfile_.Fseek(offset, SEEK_SET)) return 1;
>>>>
>>>> ^
>>>>
>>>> FileIO_Mpi.cpp: In member function 'virtual int FileIO_Mpi::Rewind()':
>>>>
>>>> FileIO_Mpi.cpp:39:24: error: 'SEEK_SET' was not declared in this scope
>>>>
>>>> if (pfile_.Fseek(0L, SEEK_SET)) return 1;
>>>>
>>>> ^
>>>>
>>>> make[4]: *** [FileIO_Mpi.o] Error 1
>>>>
>>>> make[4]: Leaving directory `/home/user/amber16/
>> AmberTools/src/cpptraj/src'
>>>>
>>>> make[3]: *** [parallel] Error 2
>>>>
>>>> make[3]: Leaving directory `/home/user/amber16/AmberTools/src/cpptraj'
>>>>
>>>>
>>>> 510,1
>>>>
>>>> 2016-10-18 22:01 GMT+08:00 jacky zhao <jackyzhao010.gmail.com>:
>>>>
>>>>> Very well for this good news.
>>>>> .Case I have used MPICH3.2 to compile cpu-mpi version of amber16, but
>>>>> the same error occur. As your suggestion, I am using MPICH3.1.4 to
>>>>> recompile it.
>>>>>
>>>>>
>>>>> 2016-10-18 21:22 GMT+08:00 Daniel Roe <daniel.r.roe.gmail.com>:
>>>>>
>>>>>> In addition to Dave's suggestions, I am currently working on a patch
>>>>>> to AmberTools that addresses this. Should be out today or tomorrow.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> On Tue, Oct 18, 2016 at 9:08 AM, David A Case <david.case.rutgers.edu
>>>
>>>>>> wrote:
>>>>>>> On Tue, Oct 18, 2016, jacky zhao wrote:
>>>>>>>
>>>>>>>> open-mpi version is 2.0.1. and open-mpi example test is passed.
>>>>>>>> I have used open-mpi to compile cuda and cuda.mpi verion of amber16
>>>>>> without
>>>>>>>> any problems. But error was encountered in the cpu-mpi version of
>>>>>> amber16.
>>>>>>>> The information are list below:
>>>>>>>>
>>>>>>>>
>>>>>>>> if (pfile_.Fseek(offset, SEEK_SET)) return 1;
>>>>>>>>
>>>>>>>> FileIO_Mpi.cpp: In member function 'virtual int
>> FileIO_Mpi::Rewind()':
>>>>>>>> FileIO_Mpi.cpp:39:24: error: 'SEEK_SET' was not declared in this
>> scope
>>>>>>>> if (pfile_.Fseek(0L, SEEK_SET)) return 1;
>>>>>>>> ^
>>>>>>>
>>>>>>> Option 1: use an older version of OpenMPI.
>>>>>>> Option 2: use the patch suggested below, or get the github version
>> of
>>>>>>> OpenMPI
>>>>>>> Option 3: use a different MPI, say mpich2
>>>>>>>
>>>>>>> See these previous posts:
>>>>>>>
>>>>>>> http://archive.ambermd.org/201610/0109.html
>>>>>>>
>>>>>>> ...dac
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> AMBER mailing list
>>>>>>> AMBER.ambermd.org
>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -------------------------
>>>>>> Daniel R. Roe
>>>>>> Laboratory of Computational Biology
>>>>>> National Institutes of Health, NHLBI
>>>>>> 5635 Fishers Ln, Rm T900
>>>>>> Rockville MD, 20852
>>>>>> https://www.lobos.nih.gov/lcb
>>>>>>
>>>>>> _______________________________________________
>>>>>> AMBER mailing list
>>>>>> AMBER.ambermd.org
>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lei Zhao, Ph.D.
>>>>> International Joint Cancer Institute of the Second Military Medical
>>>>> University
>>>>> National Engineering Research Center for Antibody Medicine
>>>>> New Library Building 11th floor,800 Xiang Yin Road
>>>>> Shanghai 200433
>>>>> P.R.China
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lei Zhao, Ph.D.
>>>> International Joint Cancer Institute of the Second Military Medical
>>>> University
>>>> National Engineering Research Center for Antibody Medicine
>>>> New Library Building 11th floor,800 Xiang Yin Road
>>>> Shanghai 200433
>>>> P.R.China
>>>>
>>>
>>>
>>>
>>> --
>>> Lei Zhao, Ph.D.
>>> International Joint Cancer Institute of the Second Military Medical
>>> University
>>> National Engineering Research Center for Antibody Medicine
>>> New Library Building 11th floor,800 Xiang Yin Road
>>> Shanghai 200433
>>> P.R.China
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> --
>> -------------------------
>> Daniel R. Roe
>> Laboratory of Computational Biology
>> National Institutes of Health, NHLBI
>> 5635 Fishers Ln, Rm T900
>> Rockville MD, 20852
>> https://www.lobos.nih.gov/lcb
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>
> --
> Lei Zhao, Ph.D.
> International Joint Cancer Institute of the Second Military Medical
> University
> National Engineering Research Center for Antibody Medicine
> New Library Building 11th floor,800 Xiang Yin Road
> Shanghai 200433
> P.R.China
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Oct 18 2016 - 10:00:02 PDT
Custom Search