Dear Prof. D.A. Case,
Thank you for your comment. I looked again at the parallelization of MMPBSA.py.MPI.
In conclusion, it turns out that the workaround about the installation of MMPBSA.py.MPI is not working well for parallelization, and I found a much simpler solution.
First, I deleted all the amber20 directories for verification, and then I built and reinstalled Amber20 and AmberTools20 again from the source codes. I used gcc 9.3.0 and openmpi 4.0.5 for installation.
```
tar jxvf AmberTools20.tar.bz2 ; tar jxvf Amber20.tar.bz2
mv amber20_src amber20
cd amber20/
./update_amber --update
./configure -noX11 gnu
test -f /path/to/amber20/amber.sh && source /path/to/amber20/amber.sh
make -j16 install
make clean ; ./configure -noX11 -mpi gnu ; make -j16 install
```
And as you suggested, I ran the following commands.
```
cd $AMBERHOME/bin
cp MMPBSA.py MMPBSA.py.MPI
```
Then I tried to display the help for MMPBSA.py.MPI, but it did not work at this point.
```
$ MMPBSA.py.MPI --help
Traceback (most recent call last):
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 53, in <module>
from mpi4py import MPI
ModuleNotFoundError: No module named 'mpi4py'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 55, in <module>
raise MMPBSA_Error('Could not import mpi4py package! Use serial version '
MMPBSA_mods.exceptions.MMPBSA_Error: Could not import mpi4py package! Use serial version or install mpi4py.
```
Then, I've installed mpi4py-3.0.3 by typing `amber.conda install mpi4py` as reported in
http://archive.ambermd.org/202012/0154.html . After this installation, the help message was correctly shown. However, an error message shown below sometimes occurred without reproducibility.
```
Loading amber20
Loading requirement: cuda/11.1.105 openmpi/4.0.5_gcc9.3.0
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
File "/usr/local/package/amber20/bin/MMPBSA.py.MPI", line 99, in <module>
app.file_setup()
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 130, in file_setup
self.remove(0)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/main.py", line 871, in remove
utils.remove(flag, mpi_size=self.mpi_size, fnpre=self.pre)
File "/usr/local/package/amber20/lib/python3.8/site-packages/MMPBSA_mods/utils.py", line 112, in remove
for fil in tempfiles: os.remove(fil)
FileNotFoundError: [Errno 2] No such file or directory: '_MMPBSA_gb_decomp_com.mdin'
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[64671,1],9]
Exit code: 1
--------------------------------------------------------------------------
```
For some reason, this workaround sometimes seemed work well and the MMPBSA.py.MPI started the calculation. However, I found that the workaround did not contribute to the parallelization of MMPBSA.py. In other words, the MMPBSA.py.MPI was running on a single core.
```
-rw-rw-r-- 1 moriwaki moriwaki 10337450 Feb 6 18:08 _MMPBSA_complex_gb.mdout.0
-rw-rw-r-- 1 moriwaki moriwaki 338396 Feb 6 18:08 _MMPBSA_restrt.0
-rw-rw-r-- 1 moriwaki moriwaki 10833081 Feb 6 17:17 _MMPBSA_ligand.mdcrd
-rw-rw-r-- 1 moriwaki moriwaki 10833081 Feb 6 17:17 _MMPBSA_ligand.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 120429 Feb 6 17:17 _MMPBSA_ligand.pdb
-rw-rw-r-- 1 moriwaki moriwaki 14892 Feb 6 17:17 _MMPBSA_ligand_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 54341 Feb 6 17:17 _MMPBSA_dummyligand.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 91810581 Feb 6 17:17 _MMPBSA_receptor.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 1020177 Feb 6 17:17 _MMPBSA_receptor.pdb
-rw-rw-r-- 1 moriwaki moriwaki 6563 Feb 6 17:17 _MMPBSA_receptor_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 459783 Feb 6 17:17 _MMPBSA_dummyreceptor.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 102643281 Feb 6 17:17 _MMPBSA_complex.mdcrd.0
-rw-rw-r-- 1 moriwaki moriwaki 1140599 Feb 6 17:17 _MMPBSA_complex.pdb
-rw-rw-r-- 1 moriwaki moriwaki 7136 Feb 6 17:17 _MMPBSA_normal_traj_cpptraj.out
-rw-rw-r-- 1 moriwaki moriwaki 514022 Feb 6 17:17 _MMPBSA_dummycomplex.inpcrd
-rw-rw-r-- 1 moriwaki moriwaki 285 Feb 6 17:17 _MMPBSA_gb_decomp_com.mdin
-rw-rw-r-- 1 moriwaki moriwaki 238 Feb 6 17:17 _MMPBSA_gb_decomp_lig.mdin
-rw-rw-r-- 1 moriwaki moriwaki 240 Feb 6 17:17 _MMPBSA_gb_decomp_rec.mdin
```
This calculation required the same amount of time as using the serial version of MMPBSA.py.
Then, I suggest a much simpler solution to parallelize MMPBSA.py properly: Use `amber.pip install mpi4py` instead of `amber.conda install mpi4py`. After the installation using pip and copying MMPBSA.py to MMPBSA.py.MPI, the MMPBSA calculations were correctly parallelized and reduced the calculation time.
I hope this solution will help you create a patch for the users of MMPBSA.py.
Best regards,
Yoshitaka Moriwaki.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Feb 06 2021 - 02:30:02 PST