Prerequisite
Install Ubuntu 16.04:
- Download a recent Ubuntu Desktop ISO (Version 16.04);
- Prepare a USB Disk with storage larger than 4G;
- Install Etcher to burn the Ubuntu iso to USB disk;
- Boot from USB;
- Follow the instructions.
Install Nvidia Driver
Login to tty1
by pressing ctrl-alt-F1
:
sudo -s
# turn down GUI
service lightdm stop
# remove nouveau
modprobe -r nouveau
Install Nvidia Driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# by far the latest version is nvidia-390
sudo apt install nvidia-390
sudo apt install mesa-common-dev
sudo apt install freeglut3-dev
sudo reboot
Remember to do as instructed: turn off the Secure Boot
from BIOS
.
Install CUDA + cuDNN
CUDA
Go to NVIDIA official website and download the 8.0 GA2 runfile(local)
which was released Feb 2017.
chmod u+x cuda_8.0.61_375.26_linux.run
sudo sh ./cuda_8.0.27_linux.run --tmpdir=/tmp --override
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.77?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/programmer ]:
Then we get the success summary:
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/yzy, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-8.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_9978.log
Then download the patch and install it:
chmod u+x ./cuda_8.0.61.2_linux.run
sh ./cuda_8.0.61.2_linux.run
Then we get the success feedback:
Logging to /tmp/cuda_patch_12273.log
Welcome to the CUDA Patcher.
Detected pager as 'less'.
Do you accept the previously read EULA?
accept/decline/quit: accept
Enter CUDA Toolkit installation directory
[ default is /usr/local/cuda-8.0 ]:
Installation complete!
Installation directory: /usr/local/cuda-8.0
Try to test CUDA
with nvidia-smi
:
Sun Apr 8 17:18:33 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970M Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 9W / N/A | 349MiB / 3022MiB | 15% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1067 G /usr/lib/xorg/Xorg 192MiB |
| 0 1767 G compiz 87MiB |
| 0 5697 G ...-token=6094D1A200DEBFC4C181ACFC1F4AAFE9 63MiB |
+-----------------------------------------------------------------------------+
Don’t forget to include the following lines into ~/.zshrc
:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
Test 1
Then let’s test with an example of CUDA:
cd NVIDIA_CUDA-8.0_Samples/1_Utilities
make
If we got:
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery deviceQuery.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
and then with ./deviceQuery
, we got:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 970M"
CUDA Driver Version / Runtime Version 9.1 / 8.0
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 3022 MBytes (3169058816 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1038 MHz (1.04 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 970M
Result = PASS
Test 2
Let’s do another test nobody
:
cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody/
make
If we got:
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o bodysystemcuda.o -c bodysystemcuda.cu
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o nbody.o -c nbody.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o render_particles.o -c render_particles.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o nbody bodysystemcuda.o nbody.o render_particles.o -L/usr/lib/"nvidia-367" -lGL -lGLU -lX11 -lglut
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp nbody ../../bin/x86_64/linux/release
and with ./nbody -benchmark -numbodies=256000 -device=0
, we got:
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 970M
> Compute 5.2 CUDA device: [GeForce GTX 970M]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 7884.710 ms
= 83.118 billion interactions per second
= 1662.357 single-precision GFLOP/s at 20 flops per interaction
Then we are all set with CUDA
.
For now, we are done with CUDA
. :)
cuDNN
It’s easier to download cuDNN
from here, though we might also need to join as a member. Since we downloaded CUDA 8.0
, here we are going to take the cuDNN v7.1.2
(Mar 21, 2018).
tar -zxvf cudnn-8.0-linux-x64-v7.1.tgz
And we get:
cuda/include/cudnn.h
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.5
cuda/lib64/libcudnn.so.5.0.5
cuda/lib64/libcudnn_static.a
then copy them to specific places:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Tensorflow
Install Dependencies
sudo apt install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev lib
sudo pip install -U pip
Installation
-
Follow the instructions on the Anaconda download site and Anaconda Documentation to download and install Anaconda.
-
Create a conda environment named
tensorflow
to run a version of Python by invoking the following command:
$ conda create -n tensorflow pip python=2.7 # or python=3.3, etc.
- Activate the conda environment by issuing the following command:
$ source activate tensorflow
(tensorflow)$ # Your prompt should change
- Issue a command of the following format to install TensorFlow inside your conda environment:
(tensorflow)$ pip install --ignore-installed --upgrade <tfBinaryURL>
where tfBinaryURL is the URL of the TensorFlow Python package. For example, the following command installs the GPU-supported version of TensorFlow for Python 3.6:
(tensorflow)$ pip install --ignore-installed --upgrade \
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0-cp36-cp36m-linux_x86_64.whl
To install more requirements, do:
(tensorflow)$ pip install -r example-requirements.txt
More commands with conda environment can be found here
After installing TensorFlow, validate the installation.
Install Jupyter Notebook
To install Jupyter notebooks in a conda environment:
$ conda install jupyter notebook
To start a notebook server, then enter
$ jupyter notebook --browser="chrome"
By default, the notebook server runs at http://localhost:8888
.
You should consider installing Notebook Conda to help manage your environments. Run the following command:
conda install nb_conda
Then if you run the notebook server from a conda environment, you’ll also have access to the “Conda” tab, where you can manage your environments from within Jupyter. You can create new environments, install packages, update packages, export environments and more.
Finally, we can shut down the entier server after having all of our edited notebooks saved with
$ ctrl + C (* 2 times)
Choose [yes] with the License problem.