Back to Blog
The Complete Guide to Fixing CUDA Installation Issues

The Complete Guide to Fixing CUDA Installation Issues

12 min read
Jaber JaberBy Jaber Jaber

If you're getting started with CUDA development, you've likely encountered frustrating installation problems. I've analyzed hundreds of developer experiences and compiled this comprehensive guide to help you overcome the most common CUDA installation challenges.

Every CUDA developer has been there: you install the toolkit, everything looks fine, then nvcc --version returns "command not found." Or worse, everything works on your machine but breaks on someone else's GPU. After helping dozens of teams debug these issues, I've seen the same problems over and over.

This guide covers every major CUDA installation issue and exactly how to fix them. No more trial and error.

Table of Contents

Common Problems:

  • PATH and Environment Variable Issues
  • Driver and Toolkit Version Mismatches
  • WSL2-Specific Problems
  • Secure Boot and UEFI Issues
  • Broken Package Dependencies
  • cuDNN Installation and Compatibility
  • PyTorch CUDA Not Available
  • Windows-Specific Issues
  • Incomplete Installations

Best Practices for Prevention


1. PATH and Environment Variable Issues

Problem: `nvcc --version` returns "command not found"

This is the most common issue developers face after installing CUDA. The toolkit is installed, but the system can't find the CUDA compiler.

Root Cause: CUDA binaries are not in your system's PATH environment variable.

┌──────────────────────────────────────────────────────────────┐
│  Diagnosis Flow                                              │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  $ nvcc --version                                            │
│  → "command not found"                                       │
│                                                              │
│  $ which nvcc                                                │
│  → (empty)                                                   │
│                                                              │
│  $ ls /usr/local/cuda/bin/nvcc                               │
│  → File exists! ✓                                            │
│                                                              │
│  Conclusion: CUDA is installed, PATH is wrong                │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Solution for Linux/Ubuntu:

bash
# Add these lines to your ~/.bashrc file
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# For specific CUDA versions (e.g., CUDA 11.8):
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

# Apply the changes
source ~/.bashrc

Solution for Windows:

  1. Open System Properties → Advanced → Environment Variables
  2. Add to PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin
  3. Create new variable CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
  4. Restart your terminal or IDE

Verification:

bash
nvcc --version
which nvcc  # Linux/Mac only

2. Driver and Toolkit Version Mismatches

Problem: "CUDA driver version is insufficient" or version mismatch errors

Root Cause: Your NVIDIA driver doesn't support the CUDA toolkit version you installed, or there's a mismatch between different CUDA components.

Understanding CUDA Versions:

  • nvidia-smi shows the maximum CUDA version your driver supports
  • nvcc --version shows your installed toolkit version
  • These can be different, and that's usually fine (driver supports older toolkits)
┌──────────────────────────────────────────────────────────────┐
│  CUDA Compatibility Matrix                                   │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  CUDA Toolkit    →    Minimum Driver Version                │
│  ────────────────────────────────────────────────            │
│  CUDA 12.x              ≥ 525.60.13 (Linux)                  │
│                         ≥ 527.41 (Windows)                   │
│                                                              │
│  CUDA 11.x              ≥ 450.80.02 (Linux)                  │
│                         ≥ 452.39 (Windows)                   │
│                                                              │
│  CUDA 10.x              ≥ 410.48 (Linux)                     │
│                         ≥ 411.31 (Windows)                   │
│                                                              │
│  Rule: Driver version ≥ Toolkit requirement                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Solution 1: Upgrade Your Driver

bash
# Ubuntu/Debian
sudo ubuntu-drivers devices  # Check available drivers
sudo ubuntu-drivers autoinstall  # Install recommended driver
# OR
sudo apt install nvidia-driver-535  # Install specific version

# Verify
nvidia-smi

Solution 2: Install Compatible CUDA Version

If you can't upgrade your driver, install an older CUDA toolkit that matches your driver version. Check the CUDA compatibility matrix.

Solution 3: Fix NVML Driver/Library Mismatch

This error often appears after driver updates:

bash
# The simplest solution: reboot your system
sudo reboot

# If reboot doesn't work, reload the driver
sudo rmmod nvidia_uvm
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
sudo modprobe nvidia

3. WSL2-Specific Problems

Problem: CUDA not working in WSL2 on Windows

WSL2 has unique requirements for CUDA support that differ from native Linux installations.

┌──────────────────────────────────────────────────────────────┐
│  WSL2 CUDA Architecture (Critical Rules)                     │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Windows Host                                                │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  NVIDIA Driver (WSL-enabled)  ← Install HERE            │  │
│  │  ├─ Manages GPU hardware                                │  │
│  │  └─ Exposes /dev/dxg to WSL2                            │  │
│  └────────────────────────────────────────────────────────┘  │
│         │                                                    │
│         ↓ (passthrough)                                      │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  WSL2 Ubuntu                                            │  │
│  │  ├─ CUDA Toolkit ONLY  ← Install HERE                   │  │
│  │  ├─ NO driver installation!                             │  │
│  │  └─ Uses Windows driver via /dev/dxg                    │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  Common mistake: Installing NVIDIA driver in WSL2            │
│  Result: Conflicts and failures                              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Critical Rules for WSL2:

  1. DO NOT install NVIDIA drivers inside WSL2 - they come from Windows
  2. Install only the CUDA Toolkit in WSL2, not the driver
  3. Windows must have the WSL2-compatible NVIDIA driver

Solution:

Step 1: Install Correct Windows Driver Download the CUDA-enabled driver for WSL from NVIDIA's WSL page.

Step 2: Verify WSL2 Can See GPU

bash
# In WSL2 terminal
nvidia-smi

If this works, Windows driver is configured correctly.

Step 3: Install CUDA Toolkit in WSL2

bash
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1  # No driver installation!

Step 4: Set Environment Variables

bash
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Common WSL2 Issues:

Issue: "GPU not detected in WSL2"

  • Ensure Windows has the WSL2-enabled driver
  • Check WSL kernel version: wsl cat /proc/version (need ≥ 5.10.43.3)
  • Update WSL: wsl --update in PowerShell

Issue: Incompatible CUDA versions

  • Use only CUDA 11.0+ in WSL2
  • Don't mix CUDA installations from different methods

4. Secure Boot and UEFI Issues

Problem: "Required key not available" or drivers not loading with Secure Boot

Root Cause: UEFI Secure Boot prevents loading unsigned kernel modules, including NVIDIA drivers.

Solution 1: Disable Secure Boot (Easiest)

  1. Reboot and enter BIOS/UEFI (usually DEL, F2, or F12 during startup)
  2. Navigate to Security or Boot settings
  3. Disable Secure Boot
  4. Save and exit
  5. Reboot into Linux

Solution 2: Sign NVIDIA Modules (Keep Secure Boot Enabled)

bash
# Generate MOK (Machine Owner Key)
openssl req -new -x509 -newkey rsa:2048 -keyout MOK.priv -outform DER -out MOK.der -nodes -days 36500 -subj "/CN=Custom MOK/"

# Enroll the key
sudo mokutil --import MOK.der
# You'll be prompted to create a password

# Reboot - you'll see MOK Manager
# Select "Enroll MOK" → "Continue" → "Yes"
# Enter the password you created
# Reboot again

# Now sign the NVIDIA modules
sudo /usr/src/linux-headers-$(uname -r)/scripts/sign-file sha256 ./MOK.priv ./MOK.der $(modinfo -n nvidia)

Solution 3: Use Built-in DKMS Signing (Ubuntu 18.04+)

bash
# During NVIDIA driver installation, you'll be prompted to create a password
# After reboot, MOK Manager appears automatically
# Select "Enroll MOK" and enter your password

Verification:

bash
# Check if driver is loaded
lsmod | grep nvidia

# If loaded, secure boot is working with signed modules
mokutil --sb-state  # Should show "SecureBoot enabled"

5. Broken Package Dependencies

Problem: "Unable to correct problems, you have held broken packages" or apt-get failures

This typically happens when CUDA installations conflict with existing packages or incomplete previous installations.

Diagnosis:

bash
sudo apt-get install -f  # Attempt to fix
sudo dpkg --configure -a  # Configure unconfigured packages
apt-cache policy cuda  # Check available versions

Solution 1: Clean Slate Approach

bash
# Remove all CUDA and NVIDIA packages
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get autoremove
sudo apt-get autoclean

# Remove CUDA repository configurations
sudo rm /etc/apt/sources.list.d/cuda*
sudo rm /etc/apt/sources.list.d/nvidia*

# Clean apt cache
sudo apt-get clean

# Update package lists
sudo apt-get update

# Now reinstall from scratch

Solution 2: Fix Specific Dependency Conflicts

bash
# If apt tells you a specific package is problematic
sudo dpkg --purge --force-all package-name

# Use aptitude for better dependency resolution
sudo apt-get install aptitude
sudo aptitude install cuda

# Aptitude will offer solutions - accept the one that installs/updates packages

Solution 3: Handle "File Already Exists" Conflicts

bash
# If error says "trying to overwrite '/path/to/file'"
# Find which package owns the conflicting file
dpkg -S /path/to/file

# Remove that package first
sudo apt-get remove conflicting-package

# Then continue with installation

For Persistent Issues:

bash
# Use local installer instead of apt
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run --toolkit --samples --silent

6. cuDNN Installation and Compatibility

Problem: cuDNN version mismatch or "cannot find cuDNN"

Root Cause: Deep learning frameworks require specific cuDNN versions that must match your CUDA version.

┌──────────────────────────────────────────────────────────────┐
│  CUDA ↔ cuDNN Compatibility Matrix                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  CUDA Version    →    Compatible cuDNN Versions             │
│  ────────────────────────────────────────────────────────    │
│  CUDA 12.4              cuDNN 9.x                            │
│  CUDA 12.1              cuDNN 8.9+                           │
│  CUDA 11.8              cuDNN 8.6 - 8.9                      │
│  CUDA 11.x              cuDNN 8.x                            │
│                                                              │
│  Framework requirements:                                     │
│  • PyTorch 2.0+:     cuDNN 8.5+                              │
│  • TensorFlow 2.13+: cuDNN 8.6+                              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Solution 1: Install Compatible cuDNN via apt (Ubuntu)

bash
# Check your CUDA version
nvcc --version

# For CUDA 11.8
sudo apt-get install libcudnn8=8.9.7.*-1+cuda11.8
sudo apt-get install libcudnn8-dev=8.9.7.*-1+cuda11.8

# For CUDA 12.x
sudo apt-get install libcudnn9-cuda-12

Solution 2: Manual Installation

  1. Download cuDNN from NVIDIA's cuDNN page (requires free account)
  2. Choose the version matching your CUDA toolkit
  3. Extract and copy files:
bash
tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive

# Copy files to CUDA installation
sudo cp include/cudnn*.h /usr/local/cuda/include
sudo cp lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

Verification:

bash
# Check cuDNN version
cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

# Or
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Solution 3: Multiple cuDNN Versions with Conda

bash
# Create isolated environment
conda create -n myenv python=3.10
conda activate myenv

# Install specific CUDA and cuDNN
conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9.2

Common Error: "Loaded runtime CuDNN library X but source was compiled with Y"

This means your Python package expects a different cuDNN version.

Fix:

bash
# Option 1: Reinstall the package with matching cuDNN
pip uninstall tensorflow-gpu  # or pytorch
pip install tensorflow-gpu==2.15.0  # Version compatible with your cuDNN

# Option 2: Install the expected cuDNN version
sudo apt-get install libcudnn8=X.X.X.*-1+cudaY.Y  # Match the expected version

7. PyTorch CUDA Not Available

Problem: `torch.cuda.is_available()` returns `False`

This is one of the most frustrating issues for deep learning developers.

Diagnosis Steps:

python
import torch
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.version.cuda)  # CUDA version PyTorch was built with

# Try to create a CUDA tensor (provides better error messages)
try:
    torch.zeros(1).cuda()
except Exception as e:
    print(f"Error: {e}")
┌──────────────────────────────────────────────────────────────┐
│  PyTorch CUDA Troubleshooting Decision Tree                  │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  torch.cuda.is_available() == False                          │
│         │                                                    │
│         ├─→ pip list | grep torch shows "+cpu"?              │
│         │   └─→ YES: Installed CPU-only version              │
│         │       FIX: Reinstall with CUDA support              │
│         │                                                    │
│         ├─→ nvidia-smi fails?                                │
│         │   └─→ YES: Driver problem                          │
│         │       FIX: Install/update NVIDIA driver             │
│         │                                                    │
│         ├─→ torch.version.cuda > nvidia-smi CUDA?            │
│         │   └─→ YES: Driver doesn't support PyTorch CUDA     │
│         │       FIX: Upgrade driver or use older PyTorch      │
│         │                                                    │
│         └─→ CUDA_VISIBLE_DEVICES set incorrectly?            │
│             └─→ YES: GPU hidden from PyTorch                 │
│                 FIX: Unset or correct the variable           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Common Causes and Solutions:

Cause 1: Installed CPU-only PyTorch

This is the most common issue. The default pip install torch installs CPU-only version.

Solution:

bash
# Uninstall current PyTorch
pip uninstall torch torchvision torchaudio

# Install with CUDA support (check pytorch.org for your CUDA version)
# For CUDA 11.8:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Cause 2: PyTorch CUDA version doesn't match driver

PyTorch bundles its own CUDA runtime, but your driver must support it.

Solution:

bash
# Check driver version
nvidia-smi  # Look at "Driver Version" and "CUDA Version"

# Install PyTorch version compatible with your driver
# If driver supports CUDA 11.8, install PyTorch with CUDA 11.8
# If driver supports CUDA 12.1+, you can use any lower CUDA version

Cause 3: Wrong compute capability

Older GPUs might not be supported by recent PyTorch builds.

Check GPU compute capability:

bash
# Method 1
nvidia-smi --query-gpu=compute_cap --format=csv

# Method 2 (if you can import torch)
python -c "import torch; print(torch.cuda.get_device_capability())"

Solution:

  • Compute capability < 3.5: Not supported by modern PyTorch
  • Compute capability 3.5-5.2: May need older PyTorch versions
  • Compute capability ≥ 5.0: Fully supported

If your GPU is too old, consider:

bash
# Install older PyTorch version (e.g., 1.7.1 for older GPUs)
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Cause 4: Environment/Conda issues

Solution:

bash
# Create fresh environment
conda create -n pytorch_env python=3.10
conda activate pytorch_env

# Install PyTorch through conda (often more reliable)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Cause 5: CUDA_VISIBLE_DEVICES is set incorrectly

bash
# Check if the variable is hiding your GPU
echo $CUDA_VISIBLE_DEVICES

# If it's set to empty or wrong value, unset it
unset CUDA_VISIBLE_DEVICES

# Or set it correctly (0 for first GPU)
export CUDA_VISIBLE_DEVICES=0

Quick Troubleshooting Script:

python
# Run this diagnostic script
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Number of GPUs: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Compute Capability: {torch.cuda.get_device_capability(0)}")

8. Windows-Specific Issues

Problem: "Cannot find compiler 'cl.exe' in PATH"

Root Cause: CUDA on Windows requires Microsoft Visual Studio's C++ compiler (cl.exe), which is not included in the CUDA toolkit.

┌──────────────────────────────────────────────────────────────┐
│  Windows CUDA Build Requirements                             │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  CUDA Toolkit alone is NOT enough on Windows!               │
│                                                              │
│  Required components:                                        │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ 1. CUDA Toolkit (nvcc, libraries)                      │  │
│  │ 2. Visual Studio with C++ tools (cl.exe)               │  │
│  │ 3. Windows SDK                                         │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  CUDA + VS Compatibility:                                    │
│  • CUDA 12.5+:  VS 2019 16.11 - VS 2022 17.10                │
│  • CUDA 12.0-12.4: VS 2019 16.11 - VS 2022 17.7              │
│  • CUDA 11.8:   VS 2019 16.11 - VS 2022 17.0                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Solution:

Step 1: Install Visual Studio

  1. Download Visual Studio Community (free)
  2. During installation, select "Desktop development with C++"
  3. Ensure these components are checked:
  • MSVC v143 C++ build tools (or latest)
  • Windows SDK
  • C++ CMake tools

Step 2: Verify Installation

cmd
# Open "Developer Command Prompt for VS"
where cl.exe
cl.exe

Step 3: Add to PATH (if needed)

If cl.exe is not found in normal Command Prompt:

  1. Find cl.exe location (usually):
   C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.XX.XXXXX\bin\Hostx64\x64
  1. Add to System PATH or use Visual Studio's Developer Command Prompt

Step 4: Verify CUDA Can Find Compiler

cmd
nvcc --version
nvcc test.cu  # Try compiling a simple CUDA file

Common VS Version Compatibility Issues:

If you have VS 2022 17.10+ with CUDA 12.4 or earlier:

cmd
# Option 1: Upgrade CUDA to 12.5+
# Option 2: Add this flag when compiling
nvcc -allow-unsupported-compiler your_file.cu

# Option 3: Downgrade VS to 17.9
# Option 4: Use flag in CMake
set CMAKE_CUDA_FLAGS=-allow-unsupported-compiler

Problem: GeForce Experience conflicts (Windows)

Solution:

When installing CUDA on Windows, use Custom Installation and deselect:

  • GeForce Experience (if already installed)
  • Display Driver (if you already have a working driver)

Install only:

  • CUDA Toolkit
  • Development tools
  • Documentation

9. Incomplete Installations

Problem: "Incomplete installation" or "Driver not selected" warnings

Symptoms:

  • Installation completes but with warnings
  • nvidia-smi works but nvcc doesn't
  • Some CUDA samples fail to compile

Solution 1: Toolkit-Only Installation (When Driver Already Works)

bash
# Use the runfile installer with toolkit-only flag
sudo sh cuda_12.1.0_530.30.02_linux.run --toolkit --samples --silent

# Set environment variables
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH

Solution 2: Full Reinstallation

bash
# Complete cleanup
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo apt-get autoremove

# Remove any manual installations
sudo rm -rf /usr/local/cuda*

# Reboot
sudo reboot

# Fresh installation using recommended method
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1 nvidia-driver-535

Solution 3: Fix Missing Components

If only specific components are missing:

bash
# Install missing samples
sudo apt-get install cuda-samples-12-1

# Install missing libraries
sudo apt-get install cuda-libraries-12-1

# Install development files
sudo apt-get install cuda-libraries-dev-12-1

10. Best Practices

Prevention is Better Than Cure

┌──────────────────────────────────────────────────────────────┐
│  The Right Way to Install CUDA                               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Step 1: Plan                                                │
│  • Check GPU compute capability                              │
│  • Verify OS and kernel version                              │
│  • Research framework requirements                           │
│  • Pick compatible versions                                  │
│                                                              │
│  Step 2: Clean                                               │
│  • Remove old installations completely                       │
│  • Clean package manager cache                               │
│  • Reboot if needed                                          │
│                                                              │
│  Step 3: Install (in order)                                  │
│  1. NVIDIA Driver                                            │
│  2. CUDA Toolkit                                             │
│  3. cuDNN                                                    │
│  4. Framework (PyTorch/TensorFlow)                           │
│                                                              │
│  Step 4: Verify                                              │
│  • Test nvidia-smi                                           │
│  • Test nvcc --version                                       │
│  • Compile and run deviceQuery                               │
│  • Test framework GPU access                                 │
│                                                              │
│  Step 5: Document                                            │
│  • Save your working environment                             │
│  • Record all version numbers                                │
│  • Keep notes for future reference                           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Before Installing:

  1. Check Compatibility
  • GPU compute capability
  • OS version and kernel
  • Driver requirements
  • Framework requirements (PyTorch/TensorFlow versions)
  1. Clean Existing Installations
bash
   sudo apt-get remove --purge '^nvidia-.*' '^cuda-.*'
   sudo apt-get autoremove
  1. Use Official Sources
  • Download from nvidia.com, not third-party repos
  • Verify checksums
  • Follow the official installation guide for your OS

During Installation:

  1. Install in the Right Order
  • Driver first (or let CUDA installer handle it)
  • CUDA Toolkit second
  • cuDNN third
  • Framework (PyTorch/TensorFlow) last
  1. Use Local Installers for Stability
  • Network installers can fail or pull wrong versions
  • Local runfiles give you more control
  1. Note Your Versions
bash
   # Save this info for troubleshooting
   nvidia-smi
   nvcc --version
   cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

After Installation:

  1. Verify Everything
bash
   # Driver
   nvidia-smi

   # Toolkit
   nvcc --version

   # Test compilation
   cd /usr/local/cuda/samples/1_Utilities/deviceQuery
   sudo make
   ./deviceQuery
  1. Create Environment Backup
bash
   # Save your working environment
   conda env export > working_cuda_env.yml

   # Or with pip
   pip freeze > requirements.txt
  1. Document Your Setup

Keep a record of:

  • GPU model
  • Driver version
  • CUDA version
  • cuDNN version
  • Framework versions
  • OS and kernel version

For Beginners:

bash
# Ubuntu: Use apt packages (easiest)
sudo apt-get update
sudo apt-get install nvidia-driver-535
sudo apt-get install cuda-toolkit-12-1

For Advanced Users:

bash
# Use runfile for maximum control
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run

For Deep Learning:

bash
# Use Docker containers (most reliable)
docker pull nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04

# Or use Conda for isolated environments
conda create -n ml_env python=3.10
conda activate ml_env
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia

When Things Go Wrong:

  1. Don't Panic-Install
  • Installing multiple versions won't fix issues
  • Clean up first, then reinstall
  1. Read Error Messages Carefully
  • Most errors tell you exactly what's wrong
  • Google the specific error message
  1. Check Logs
bash
   # CUDA installer logs
   cat /var/log/cuda-installer.log

   # System logs
   dmesg | grep -i nvidia
   journalctl -xe | grep -i nvidia
  1. Ask for Help Properly

When posting on forums, include:

  • OS and version
  • GPU model
  • Output of nvidia-smi
  • Output of nvcc --version
  • Complete error messages
  • What you've already tried

Quick Troubleshooting Checklist

Use this checklist when CUDA isn't working:

┌──────────────────────────────────────────────────────────────┐
│  CUDA Troubleshooting Checklist                              │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  □ nvidia-smi shows GPU?                                     │
│    → If no: Driver issue                                     │
│                                                              │
│  □ nvcc --version works?                                     │
│    → If no: PATH issue                                       │
│                                                              │
│  □ CUDA version compatible with driver?                      │
│    → Check compatibility matrix                              │
│                                                              │
│  □ cuDNN installed and compatible?                           │
│    → Check version matching                                  │
│                                                              │
│  □ PyTorch/TensorFlow sees GPU?                              │
│    → Check framework installation                            │
│                                                              │
│  □ Secure Boot disabled (or modules signed)?                 │
│    → Check boot settings                                     │
│                                                              │
│  □ All environment variables set?                            │
│    → Check PATH, LD_LIBRARY_PATH, CUDA_HOME                  │
│                                                              │
│  □ On WSL2?                                                  │
│    → Follow WSL2-specific guide                              │
│                                                              │
│  □ On Windows?                                               │
│    → Visual Studio installed?                                │
│                                                              │
│  □ Package dependencies clean?                               │
│    → Run apt --fix-broken install                            │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Conclusion

CUDA installation can be tricky, but most issues fall into these common categories. The key is:

  1. Identify the specific problem using error messages
  2. Understand the root cause (version mismatch, missing dependencies, etc.)
  3. Apply the targeted solution rather than reinstalling everything
  4. Verify the fix with proper testing

Remember: A working CUDA installation requires:

  • Compatible GPU hardware
  • Proper NVIDIA driver
  • CUDA Toolkit
  • Correct environment variables
  • (For deep learning) Compatible cuDNN
  • (For deep learning) Framework installed with CUDA support

If you're still stuck after trying these solutions, the CUDA community on NVIDIA Developer Forums and Stack Overflow are excellent resources.

Want to skip these headaches? Try RightNow AI - our CUDA development environment comes with built-in profiling, debugging, and optimization tools. We handle all the installation complexity so you can focus on writing fast kernels.

Download RightNow AI


Last updated: October 2025

CUDAInstallationGPU SetupDeveloper ToolsTroubleshooting