Prerequisite

Install Ubuntu 16.04：

Download a recent Ubuntu Desktop ISO (Version 16.04);
Prepare a USB Disk with storage larger than 4G;
Install Etcher to burn the Ubuntu iso to USB disk;
Boot from USB;
Follow the instructions.

Install Nvidia Driver

sudo -s
# turn down GUI
service lightdm stop
# remove nouveau
modprobe -r nouveau

Install Nvidia Driver

sudo add-apt-repository  ppa:graphics-drivers/ppa
sudo apt update
# by far the latest version is nvidia-390
sudo apt install nvidia-390
sudo apt install mesa-common-dev
sudo apt install freeglut3-dev
sudo reboot

Remember to do as instructed: turn off the Secure Boot from BIOS.

Install CUDA + cuDNN

CUDA

Go to NVIDIA official website and download the 8.0 GA2 runfile(local) which was released Feb 2017.

chmod u+x cuda_8.0.61_375.26_linux.run
sudo sh ./cuda_8.0.27_linux.run --tmpdir=/tmp --override

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.77?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/programmer ]:

Then we get the success summary:

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/yzy, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-8.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_9978.log

Then download the patch and install it:

chmod u+x ./cuda_8.0.61.2_linux.run
sh ./cuda_8.0.61.2_linux.run

Then we get the success feedback:

Logging to /tmp/cuda_patch_12273.log
Welcome to the CUDA Patcher.
Detected pager as 'less'.
Do you accept the previously read EULA?
accept/decline/quit: accept

Enter CUDA Toolkit installation directory
 [ default is /usr/local/cuda-8.0 ]: 

Installation complete!
Installation directory: /usr/local/cuda-8.0

Try to test CUDA with nvidia-smi:

Sun Apr  8 17:18:33 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8     9W /  N/A |    349MiB /  3022MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1067      G   /usr/lib/xorg/Xorg                           192MiB |
|    0      1767      G   compiz                                        87MiB |
|    0      5697      G   ...-token=6094D1A200DEBFC4C181ACFC1F4AAFE9    63MiB |
+-----------------------------------------------------------------------------+

Don’t forget to include the following lines into ~/.zshrc:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda

Test 1

Then let’s test with an example of CUDA:

cd NVIDIA_CUDA-8.0_Samples/1_Utilities
make

If we got:

"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery deviceQuery.o 
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

and then with ./deviceQuery, we got:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 970M"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 3022 MBytes (3169058816 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1038 MHz (1.04 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 970M
Result = PASS

Test 2

Let’s do another test nobody:

cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody/
make

If we got:

"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o bodysystemcuda.o -c bodysystemcuda.cu
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o nbody.o -c nbody.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -ftz=true -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o render_particles.o -c render_particles.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o nbody bodysystemcuda.o nbody.o render_particles.o  -L/usr/lib/"nvidia-367" -lGL -lGLU -lX11 -lglut
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp nbody ../../bin/x86_64/linux/release

and with ./nbody -benchmark -numbodies=256000 -device=0, we got:

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
	-fullscreen       (run n-body simulation in fullscreen mode)
	-fp64             (use double precision floating point values for simulation)
	-hostmem          (stores simulation data in host memory)
	-benchmark        (run benchmark to measure performance) 
	-numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
	-device=<d>       (where d=0,1,2.... for the CUDA device to use)
	-numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
	-compare          (compares simulation results running once on the default GPU and once on the CPU)
	-cpu              (run n-body simulation on the CPU)
	-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 970M
> Compute 5.2 CUDA device: [GeForce GTX 970M]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 7884.710 ms
= 83.118 billion interactions per second
= 1662.357 single-precision GFLOP/s at 20 flops per interaction

Then we are all set with CUDA.

For now, we are done with CUDA. :)

cuDNN

It’s easier to download cuDNN from here, though we might also need to join as a member. Since we downloaded CUDA 8.0, here we are going to take the cuDNN v7.1.2 (Mar 21, 2018).

tar -zxvf cudnn-8.0-linux-x64-v7.1.tgz

And we get:

  cuda/include/cudnn.h
  cuda/lib64/libcudnn.so
  cuda/lib64/libcudnn.so.5
  cuda/lib64/libcudnn.so.5.0.5
  cuda/lib64/libcudnn_static.a

then copy them to specific places:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Tensorflow

Install Dependencies

sudo apt install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev lib

sudo pip install -U pip

Installation

Follow the instructions on the Anaconda download site and Anaconda Documentation to download and install Anaconda.
Create a conda environment named tensorflow to run a version of Python by invoking the following command:

$ conda create -n tensorflow pip python=2.7 # or python=3.3, etc.

Activate the conda environment by issuing the following command:

$ source activate tensorflow
 (tensorflow)$  # Your prompt should change 

Issue a command of the following format to install TensorFlow inside your conda environment:

(tensorflow)$ pip install --ignore-installed --upgrade <tfBinaryURL>

where tfBinaryURL is the URL of the TensorFlow Python package. For example, the following command installs the GPU-supported version of TensorFlow for Python 3.6:

 (tensorflow)$ pip install --ignore-installed --upgrade \
 https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0-cp36-cp36m-linux_x86_64.whl

To install more requirements, do:

(tensorflow)$ pip install -r example-requirements.txt

More commands with conda environment can be found here

After installing TensorFlow, validate the installation.

Install Jupyter Notebook

To install Jupyter notebooks in a conda environment:

$ conda install jupyter notebook

To start a notebook server, then enter

$ jupyter notebook --browser="chrome"

By default, the notebook server runs at http://localhost:8888.

You should consider installing Notebook Conda to help manage your environments. Run the following command:

conda install nb_conda

Then if you run the notebook server from a conda environment, you’ll also have access to the “Conda” tab, where you can manage your environments from within Jupyter. You can create new environments, install packages, update packages, export environments and more.

Finally, we can shut down the entier server after having all of our edited notebooks saved with

$ ctrl + C (* 2 times)

Choose [yes] with the License problem.

Ubuntu 16.04 + Nvidia GTX 970M + CUDA + cuDNN + Tensorflow

Prerequisite

Install Nvidia Driver

Install CUDA + cuDNN

CUDA

Test 1

Test 2

cuDNN

Tensorflow

Install Dependencies

Installation

Install Jupyter Notebook

Reference