Updating NVIDIA VIBs on ESXi 6.7
When updating NVIDIA VIBs from 430.46 / 431.79 to 440.87 / 443.05 on ESXi 6.7 I ran into an error message and step that was not listed in the VMware documentation - https://kb.vmware.com/s/article/2033434
The error message was:
vmkload_mod: Can not remove module nvidia: module symbols in use
The solution I found to bypass the error was to stop the nvidia-init service by running the command:
/etc/init.d/nvidia-init stop
The full process I ran through to update the VIBs on my hosts
Maintenance Mode a host to start rolling through.
Use manual DRS to move VMs or manual migrate if needed. The process is documented here.
Once Maintenance Mode is complete SSH into ESXi CLI on your MM host.
When I was running through this the xorg service was already stopped, but I ran the command anyway to make sure:
/etc/init.d/xorg stop
Stop the nvidia-init service by running the command:
/etc/init.d/nvidia-init stop
Remove the NVIDIA VMkernel driver by running the command:
vmkload_mod -u nvidia
Identify the NVIDIA VIB name by running this command:
esxcli software vib list | grep NVIDIA
Remove the VIB by running the command:
esxcli software vib remove -n nameofNVIDIAVIB
Install the new VIB:
esxcli software vib install -v /path_to_vib/nvidia_vib
Start the nvidia-init service by running the command:
/etc/init.d/nvidia-init start
Confirm driver is updated, running, and seeing the GPUs by running the command:
nvidia-smi
Output will show driver version and number of cards.
Repeat with the rest of your hosts.
Update your images with new drivers, deploy, test, and you are good to go!