Re: [AMBER] suggestion of a patch for building MPI version of MMPBSA.py

From: Yoshitaka Moriwaki <virgospica93.gmail.com>
Date: Sat, 6 Feb 2021 19:18:28 +0900

Dear Prof. D.A. Case,

Thank you for your comment. I looked again at the parallelization of MMPBSA.py.MPI.

In conclusion, it turns out that the workaround about the installation of MMPBSA.py.MPI is not working well for parallelization, and I found a much simpler solution.

First, I deleted all the amber20 directories for verification, and then I built and reinstalled Amber20 and AmberTools20 again from the source codes. I used gcc 9.3.0 and openmpi 4.0.5 for installation.

```
tar jxvf AmberTools20.tar.bz2 ; tar jxvf Amber20.tar.bz2
mv amber20_src amber20
cd amber20/
./update_amber --update
./configure -noX11 gnu
test -f /path/to/amber20/amber.sh && source /path/to/amber20/amber.sh
make -j16 install
make clean ; ./configure -noX11 -mpi gnu ; make -j16 install
```

And as you suggested, I ran the following commands.

```
cd $AMBERHOME/bin
cp MMPBSA.py MMPBSA.py.MPI
```

Then I tried to display the help for MMPBSA.py.MPI, but it did not work at this point.

```
$ MMPBSA.py.MPI --help
Traceback (most recent call last):
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 53, in <module>
     from mpi4py import MPI
ModuleNotFoundError: No module named 'mpi4py'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 55, in <module>
     raise MMPBSA_Error('Could not import mpi4py package! Use serial version '
MMPBSA_mods.exceptions.MMPBSA_Error: Could not import mpi4py package! Use serial version or install mpi4py.
```

Then, I've installed mpi4py-3.0.3 by typing `amber.conda install mpi4py` as reported in http://archive.ambermd.org/202012/0154.html . After this installation, the help message was correctly shown. However, an error message shown below sometimes occurred without reproducibility.

```
Loading amber20
   Loading requirement: cuda/11.1.105 openmpi/4.0.5_gcc9.3.0
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
   File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
     app.file_setup()
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
     self.remove(0)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
     utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
   File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
     for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
   Process name: [[64671,1],9]
   Exit code: 1
--------------------------------------------------------------------------
```

For some reason, this workaround sometimes seemed work well and the MMPBSA.py.MPI started the calculation. However, I found that the workaround did not contribute to the parallelization of MMPBSA.py. In other words, the MMPBSA.py.MPI was running on a single core.

```
-rw-rw-r-- 1 moriwaki moriwaki 10337450 Feb 6 18:08 _MMPBSA_complex_gb.mdout.0
-rw-rw-r-- 1 moriwaki moriwaki 338396 Feb 6 18:08 _MMPBSA_restrt.0
-rw-rw-r-- 1 moriwaki moriwaki 10833081 Feb 6 17:17 _MMPBSA_ligand.mdcrd
-rw-rw-r-- 1 moriwaki moriwaki 10833081 Feb 6 17:17 _MMPBSA_ligand.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 120429 Feb 6 17:17 _MMPBSA_ligand.pdb
-rw-rw-r-- 1 moriwaki moriwaki 14892 Feb 6 17:17 _MMPBSA_ligand_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 54341 Feb 6 17:17 _MMPBSA_dummyligand.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 91810581 Feb 6 17:17 _MMPBSA_receptor.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 1020177 Feb 6 17:17 _MMPBSA_receptor.pdb
-rw-rw-r-- 1 moriwaki moriwaki 6563 Feb 6 17:17 _MMPBSA_receptor_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 459783 Feb 6 17:17 _MMPBSA_dummyreceptor.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 102643281 Feb 6 17:17 _MMPBSA_complex.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 1140599 Feb 6 17:17 _MMPBSA_complex.pdb
-rw-rw-r-- 1 moriwaki moriwaki 7136 Feb 6 17:17 _MMPBSA_normal_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 514022 Feb 6 17:17 _MMPBSA_dummycomplex.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 285 Feb 6 17:17 _MMPBSA_gb_decomp_com.mdin
-rw-rw-r-- 1 moriwaki moriwaki 238 Feb 6 17:17 _MMPBSA_gb_decomp_lig.mdin
-rw-rw-r-- 1 moriwaki moriwaki 240 Feb 6 17:17 _MMPBSA_gb_decomp_rec.mdin
```

This calculation required the same amount of time as using the serial version of MMPBSA.py.

Then, I suggest a much simpler solution to parallelize MMPBSA.py properly: Use `amber.pip install mpi4py` instead of `amber.conda install mpi4py`. After the installation using pip and copying MMPBSA.py to MMPBSA.py.MPI, the MMPBSA calculations were correctly parallelized and reduced the calculation time.

I hope this solution will help you create a patch for the users of MMPBSA.py.

Best regards,
Yoshitaka Moriwaki.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Feb 06 2021 - 02:30:02 PST
Custom Search