Sunday, September 12, 2021

Installing TensorRT 7.2.1, cuDNN 8.0.4 and Cuda 11.0 update 1 in Ubuntu 18.04 (x86_64)

  1. Make sure you have up-to-date Nvidia display driver. You need v4.50.x or newer for Cuda 11 to work (I'm using v4.70.x). You can update it using standard Ubuntu repository or via Nvidia Ubuntu ppa if it's not yet up-to-date, as explained over at: https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-18-04-bionic-beaver-linux. In my case, I'm using standard Ubuntu repository without any problem. Make sure to follow the driver installation steps correctly if your machine is using UEFI secure boot and Nvidia PPA driver. Especially in the step that asks for password; make sure you remember the password that you entered because you will be asked the same password once you restart your machine to complete the driver installation.   
  2. Follow Cuda 11 update 1 installation guide at: https://docs.nvidia.com/cuda/archive/11.0/cuda-installation-guide-linux/index.html. Note: You can install certain version of Cuda in Ubuntu 18.04 as follows (we are using Cuda 11.1 as an example here):
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
    sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub
    sudo apt-get update
    sudo apt-get install cuda-11.1
    
    by using "cuda-11.1" parameter in the last command, that specific version of cuda will be installed instead of newest version.
  3. Follow cuDNN 8.0.4 installation guide at: https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-804/install-guide/index.html#installlinux-deb
  4. Follow TensorRT 7.2.1 installation guide at https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-debian up-to the "sudo apt-get update" step (before the "sudo apt-get install tensorrt" step). Then, disable nvidia compute deb sources (https://developer.download.nvidia.com/compute/*) from your ubuntu sources list file(s), except the ones in the tensorrt deb sources file. The tensorrt deb source file is usually named: /etc/apt/sources.list.d/nv-tensorrt-cudaXX.x*.list. You can disable the "nvidia compute" deb sources by commenting the line(s), i.e. adding a "#" (without the quotes) to the beginning of each line in the deb sources file (/etc/apt/sources.list.d/cuda.list, etc.) that points to https://developer.download.nvidia.com/compute/* URL.  Then, install tensorrt along with its components normally ("sudo apt-get install tensorrt", etc.). Once tensorrt installation finished, make sure to enable nvidia compute deb sources as before, i.e. by deleting the comments (#) that you add in the beginning of the line(s) of the deb sources file that points to https://developer.download.nvidia.com/compute/*. This TensorRT 7.2.1 installation fix is explained over at: https://github.com/NVIDIA/TensorRT/issues/792. This is the excerpt:    
    For those who tried this approach, and yet the problem didn't get solved, it seems like there are more than one place storing nvidia deb-src links (https://developer.download.nvidia.com/compute/*) and these links overshadowed the actual deb link of dependencies corresponding with your tensorrt version.
Just comment out these links in every possible place inside /etc/apt directory at your system (for instance: /etc/apt/sources.list , /etc/apt/sources.list.d/cuda.list , /etc/apt/sources.list.d/cuda_learn.list , /etc/apt/sources.list.d/nvidia-ml.list (except your nv-tensorrt deb-src link)) before running "apt install tensorrt" then everything works like a charm (uncomment these links after installation completes).

Bonus Fix 1: Python onnx library error

You might encounter onnx data type error shown below after your tensorrt is successfully installed in python:

TypeError: 1.0 has type numpy.float32, but expected one of: int, long, float

The error very probably caused by old protobuf version. You need to update it with pip3 via this command: pip3 install protobuf -U. For more details, see: https://github.com/onnx/onnx/issues/2534


Bonus Fix 2: Cuda 11.0 missing libcusolver.so.10

You might encounter this error in tensorflow 2.4+ application:

Could not load dynamic library 'libcusolver.so.10'

It says you have missing libcusolver.so.10.Well, actually the library is non-existent in Cuda 11 and the solution is to create symbolic link to libcusolver.so.11 which is named libcusolver.so.10 in your Cuda 11 installation. It should work, as explained over at: https://github.com/tensorflow/tensorflow/issues/45263. As for where you could find libcusolver.so.11, it depends on the Cuda 11.0 version installed in your machine. In Cuda 11.0 update 1, libcusolver.so.11 is located at /usr/local/cuda-11.4/lib64. The directory is referred to indirectly by /usr/local/cuda and /usr/local/cuda-11 symbolic links. It's a bit hairy and you need to create the libcusolver.10.so symbolic link at /usr/local/cuda-11.4/lib64 that points to libcusolver.so.11.


Post a Comment

No comments: