Update and Upgrade Ubuntu System
sudo apt update
sudo apt upgrade
Verify CUDA-Capable GPU
lspci | grep -i nvidia
It should return something like GeForce 970M
. Then goto cuda-gpu website to check compute capability, e.g. GeForce 970M -> 5.2.
Install Dependencies
sudo apt-get install build-essential
sudo apt-get install cmake git unzip zip
sudo apt-get install python-dev python3-dev python-pip python3-pip
#Install Linux Kernel Header
uname -r
This command returns the current linux kernel version, e.g. 4.15.0-36-generic
. Then to install linux header, do following:
sudo apt install linux-headers-$(uname -r)
Install NVIDIA CUDA 10.0
Remove previous cuda installation:
sudo apt purge nvidia*
sudo apt autoremove
sudo apt autoclean
sudo rm -rf /usr/local/cuda*
Install Cuda:
# For Ubuntu 16.04
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
# For Ubuntu 18.04
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
then:
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-0 cuda-drivers
Reboot System
So as to load the NVIDIA drivers.
Modify .bashrc and check installation
Update path in .bashrc
:
echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
Then:
source ~/.bashrc
sudo ldconfig
nvidia-smi
It will show some results as below:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970M Off | 00000000:01:00.0 Off | N/A |
| N/A 28C P0 22W / N/A | 201MiB / 3022MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1447 G /usr/lib/xorg/Xorg 89MiB |
| 0 1639 G /usr/bin/gnome-shell 107MiB |
+-----------------------------------------------------------------------------+
Install cuDNN 7.3.1
NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks. Goto nvidia-cudnn website and after login and accepting agreement, download cuDNN v7.3.1 Library for Linux [cuda 10.0]
.
Then goto download folder and in terminal perform following:
tar -xf cudnn-10.0-linux-x64-v7.3.1.20.tgz
sudo cp -R cuda/include/* /usr/local/cuda-10.0/include
sudo cp -R cuda/lib64/* /usr/local/cuda-10.0/lib64
Install NCCL 2.3.7
NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. Go to nvidia-nccl website and attend survey to download Nvidia NCCL v2.3.7, for CUDA 10.0 -> NCCL 2.3.7 O/S agnostic and CUDA 10.0
Goto downloaded folder and in terminal perform following:
tar -xf nccl_2.3.7-1+cuda10.0_x86_64.txz
cd nccl_2.3.7-1+cuda10.0_x86_64
sudo cp -R * /usr/local/cuda-10.0/targets/x86_64-linux/
sudo ldconfig
Install Dependencies
pip install -U --user pip six numpy wheel mock
pip3 install -U --user pip six numpy wheel mock
pip install -U --user keras_applications==1.0.5 --no-deps
pip3 install -U --user keras_applications==1.0.5 --no-deps
pip install -U --user keras_preprocessing==1.0.3 --no-deps
pip3 install -U --user keras_preprocessing==1.0.3 --no-deps
Install Tensorflow from Wheel
Download wheel file
If you have exact the same configuration as I do, you can directly download the wheel from here, and then install tensorflow with pip
(skip below and see next section).
Build your own wheel
Download bazel
cd ~/Download
wget https://github.com/bazelbuild/bazel/releases/download/0.17.2/bazel-0.17.2-installer-linux-x86_64.sh
chmod +x bazel-0.17.2-installer-linux-x86_64.sh
./bazel-0.17.2-installer-linux-x86_64.sh --user
echo 'export PATH="$PATH:$HOME/bin"' >> ~/.bashrc
Then reload environment variables
souce ~/.bashrc
sudo ldconfig
Download Tensorflow Source and Configure
cd ~/Downloads
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.12
./configure
Then we need to configure a set of stuffs:
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Press enter two times, then:
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with ROCm support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.3.7
Now we need compute capability which we have noted at step 1 eg. 5.2
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2] 5.2
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
Configuration finished
Build Tensorflow w/ Bazel
This step takes a long time (3-4 hours on average).
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
add
--config=mkl
if you want Intel MKL support for newer intel cpu for faster training on cpuadd
--config=monolithic
if you want static monolithic build (try this if build failed)add
--local_resources 2048,.5,1.0
if your PC has low ram causing Segmentation fault or other related errorsif you got error like Segmentation Fault then try again it usually worked.
The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the tensorflow_pkg directory:
bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
Then the whl
file can be found in cd tensorflow_pkg
.
With Virtualenv
Working in a virtual environment is usually a great idea to separate dependencies for different projects.
sudo apt install virtualenv
Then in the desired directory, create a virtual environment:
virtualenv tf-gpu -p /usr/bin/python3
source tf-gpu/bin/activate
pip3 install tensor*.whl
Verify Tensorflow installation
In the virtualenv (tf-gpu):
python3
python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
Or you can check current device:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
get_available_gpus()