CUDA Toolkit

How to use and update the CUDA Toolkit

What is CUDA?

CUDA is a computing platform and programming architecture developed by NVIDIA. It enables the use of NVIDIA GPUs for tasks beyond traditional graphics rendering, a practice known as General-Purpose Computing on Graphics Processing Units (GPGPU).

CUDA can significantly accelerate efforts such as AI training and scientific simulations by running many processes in parallel.

What is the CUDA Toolkit?

The CUDA Toolkit is an SDK provided by NVIDIA that contains tools, libraries, and resources to develop applications for CUDA-enabled GPUs. It includes these key components:

  • NVCC: the compiler driver that compiles CUDA code (usually written in C/C++) into an executable that can run on an NVIDIA GPU
  • cuda-gdb: a debugger for CUDA applications that integrates GPU and CPU debugging within the same environment
  • Profiling and optimization tools: including the CUDA Profiling Tools Interface (CUPTI) and NSight Compute
  • GPU-accelerated libraries: libraries to speed up tasks such as math, quantum computing, data processing, graphics processing, deep learning

Set environment variables

FluidStack’s NVIDIA GPU instances are pre-installed with CUDA and the CUDA Toolkit. However, two environment variables must be configured before you can use the Toolkit globally on your FluidStack instance.

To configure these variables, open an SSH session to your instance. Copy and paste the following three commands into the session:

$echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
>echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
>source ~/.bashrc

To confirm that the environment variables are set up correctly, enter the command:

$nvcc --help

The expected output is the helpfile for the NVCC compiler.

Find the current CUDA Toolkit version

Set the CUDA Toolkit environment variables as described above if you have not already done so.

Then enter the following command in the same SSH session:

$nvcc --version

If the CUDA Toolkit is installed and the environment variables are correctly configured, the output will show a release number, such as 12.6. This number indicates the installed CUDA Toolkit version.

The CUDA Samples repository

NVIDIA provides a repository of sample applications that demonstrates various CUDA features and best practices. This repository is available at: https://github.com/nvidia/cuda-samples. To try it out, as well as to test that CUDA is properly installed on your instance, follow the steps below.

Important

Enter all commands shown below into an SSH session to your instance.

1

Clone the CUDA Samples repository

Begin by cloning the repository to download all available samples to your instance. Run the following command:

$git clone https://github.com/nvidia/cuda-samples
2

Move to the deviceQuery directory

Move to the deviceQuery sample directory on the cloned repository:

$cd cuda-samples/Samples/1_Utilities/deviceQuery
3

Run make

The make command compiles the CUDA code in the sample directory, generating an executable program that you can run. Enter the command:

$make

The expected output is the creation of an executable named deviceQuery.

4

Run the deviceQuery application

Run the compiled executable for the deviceQuery sample:

$./deviceQuery

If successful, you should see detailed output about each your instance’s GPU(s), along with a final line that ends with RESULT = PASS.

For any other samples in the CUDA Samples repository that you want to use, follow the same procedure. Move to the directory of the sample, run make, then run the compiled executable.

How to update the CUDA Toolkit to the latest version

Warning

Updating drivers can sometimes produce unexpected results. Back up any important data on your instance before proceeding!

1

Find the currently installed version number

Follow the instructions to find the current version installed on your instance.

2

Find the latest CUDA Toolkit version number

Go to NVIDIA’s update page and confirm that the version number is higher than the version that’s currently installed on your instance.

3

Select distribution version and installer type

On NVIDIA’s update page, select the distribution (operating system) version for your instance. For Installer Type, choose the deb (network) option.

Tip

Not sure what distribution version your instance is running? To find out, enter the command lsb_release -a on your instance.

4

Run the installer commands

On NVIDIA’s update page, copy the commands under the CUDA Toolkit Installer instructions. Paste them into the SSH session to your instance.

5

Upgrade packages

Once the final command you copied has finished running, enter the following command to your instance:

$sudo apt upgrade

When prompted if you want to continue, press Enter to accept the default value of Y.

If during the upgrade you are presented with a dialog titled “Pending kernel upgrade,” press Enter. If this is followed by a dialog titled “Daemons using outdated libraries,” again press Enter.

Once the upgrade has finished, run the following command to confirm that your CUDA Toolkit version has been updated to the latest version:

$nvcc --version
6

Update the NVIDIA driver

Next, run the following command to update the NVIDIA driver:

$sudo apt-get install -y nvidia-open

You might see the same “Pendel kernel upgrade” and “Daemons using outdated libraries” dialogs during this step. Press Enter on both dialogs.

7

Reboot

To complete the NVIDIA driver update, reboot the instance:

$sudo reboot

Rebooting your instance closes your current SSH session, so you’ll need to reconnect after the reboot. If you try to reconnect too soon, you might receive a “Connection refused” error. Wait for the instance to finish starting up and try again.

Tip

If your instance fails to restart or SSH fails to reconnect after waiting for more than five minutes, try stopping and starting the instance.

8

Confirm the driver version

Once you have logged back into your instance, run the following command:

$nvidia-smi

The CUDA version shown at the top of the output from this command should match the version of the CUDA Toolkit that you installed.

9

(Optional) Test devices with the CUDA Samples repository

While this step is optional, it can help verify that the update was successful.

Follow the steps under The CUDA Samples repository to confirm that CUDA is functioning on all your instance’s GPUs.