Tuesday, January 30, 2018

Ubuntu NVidia Cuda Installation notes

There are 2 sources of Nvidia CUDA installation for TF.

I had cuda 9.1 installed and working with TF1.4 from a source compiled version then along came TF1.5 and I needed cuda 9.0!

1) the downloaded pdfs which are not exactly complete. http://developer.download.nvidia.com/compute/cuda/9.0/Prod/docs/sidebar/CUDA_Quick_Start_Guide.pdf

2) the online version of the cuda installation guide here: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html which is referred to by the NVidia help posts telling people to read the manual. This isn't complete either.

I found reinstalling kernel drivers from CUDA8->9.1->9.0 never worked. Probably not a tested configuration in trying to make sure installs work after each other.

Reinstall the OS and try again.

The runfile downloads are more stable in that you can keep them around for reproducability/production and you don't have to deal with unstable links. The deb files introduce
urls into sources.list.d which can be hard to track down and are prone to errors from Nvidia. I had a 9.0 deb file point me to a 9.1 installation because it was referencing the current URL for CUDA installation not an archive URL.

Disable nouveau. Test if you are using Nouveau using: lsmod | grep nouveau. If you see any response
you have to disable the nouveau kernel driver. Ironically the nouveau kernel drivers are opensource versions for NVidia cards.

dc@gpu1:~$ lsmod | grep nouveau
nouveau              1495040  0
mxm_wmi                16384  1 nouveau
wmi                    20480  2 mxm_wmi,nouveau
video                  40960  1 nouveau
i2c_algo_bit           16384  1 nouveau
ttm                    98304  1 nouveau
drm_kms_helper        155648  1 nouveau
drm                   364544  3 ttm,drm_kms_helper,nouveau


After generating the disable blacklist nouveau conf file, running sudo update-initramfs -u generates a kernel config file using the current kernel image and ram disk with nouveau disabled.

There are 2 conflicting processes for init level 3; the lightdm windows manager and kernel settings. Turn off lightdm on reboot and change /etc/init/rc-sysinit.conf.

Change the grub config file: /etc/default/grub and change GRUB_CMDLINE_LINUX_DEFAULT from "splash screen" to
text. Then run grub update to generate the new grub config files.

To check the current runlevel: >runlevel

For ubuntu 16.XX LTS:
sudo systemctl enable multi-user.target
sudo systemctl set-default multi-user.target
If this works you should see:
dc@gpu1:~$ runlevel
N 3








No comments:

Post a Comment