PyTorch on Summit

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. This page outlines the how to run PyTorch on various OLCF systems.

Summit

There are several ways to use PyTorch on Summit.

Note

PyTorch is also available through the open-ce module (See IBM Watson Machine Learning CE -> Open CE for more information); however, the open-ce module provides a more complex environment that may not be needed.

Provided Conda Environment

This is the easiest way to use pytorch on Summit. Simply load the pytorch module on Summit:

module load pytorch

This module activates a pre-made conda environment. The limitation of this method is that you will not be able to install extra packages in this environment.

Note

You will also need to load the following modules, because they were used when building PyTorch for the environment.

module load DefApps-2023
module load gcc/11.2.0
module load cuda/11.7.1
module load magma/2.7.2-cuda117
module load openblas/0.3.17-omp

Warning

You can change these modules if you need too but if you do PyTorch may not work. If you need different modules and they do not work with the provided environment you will have to Build from Source.

Install PyTorch with Pre-built Wheel

This option is fairly easy and provides the most flexibility. In /sw/summit/pytorch/wheel_dist there are pre-built wheel packages. To install one create a conda environment and use pip to install the correct package based on Python version:

Note

The wheel pacakges use the following namming convention torch-<pytorch version>+<git-commit>-<python version>-<python version>-linux_ppc64le.whl (e.g. torch-2.3.0a0+giteba28a6-cp311-cp311-linux_ppc64le.whl is for python=3.11).

module load miniforge3/23.11.0
conda create -p <env_path> python=x.yy
source activate <env_path>
pip install <wheel package>

This should install PyTorch in your environment. You can now install whatever packages you need on top of this.

Note

You will also need to load the following modules, because they were used when building the wheel packages.

module load DefApps-2023
module load gcc/11.2.0
module load cuda/11.7.1
module load magma/2.7.2-cuda117
module load openblas/0.3.17-omp

Warning

You can change these modules if you need too but if you do PyTorch may not work. If you need different modules and they do not work with the provided packages you will have to Build from Source.

Build from Source

Below is the documented process that we used for our builds. This documentation is provided for your information in case you would like to build PyTorch yourself; however, there is no guarantee that you will be able to build an alternative version of PyTorch.

First load the necessary modules (you can play around with these as needed):

module load miniforge3/23.11.0
module load DefApps-2023
module load gcc/11.2.0
module load cuda/11.7.1
module load magma/2.7.2-cuda117
module load openblas/0.3.17-omp

Create a conda environment and install dependencies:

conda create -p <env_path> python=x.yy
source activate <env_path>
conda install cmake ninja pyyaml typing_extensions numpy

Finally clone and build pytorch:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
python3 setup.py install
python3 setup.py bdist_wheel # use this command to create a wheel package in pytorch/dist

PyTorch should now be installed in the conda environment that you created.

Additional Resources