[ML] -CUDA-Introduction

Posted by Rico's Nerd Cluster on January 11, 2026

What is Cuda

Cuda is fundamentally C++ language with extensiions. kernels, __global__, __device__ etc are defined in the C++ space. If you want to expose C to CUDA, just follow the standard

1
extern "C" void launch_kernel(...);

So this means

  • You can use host-side containers like std::vector, then pass its .data() pointer to cudaMalloc/cudaMemcpy or to kernels. But you cannot use std::vector directly on the GPU device from device code.
  • For device-side containers use libraries designed for CUDA, e.g. thrust::device_vectorcub, or manage raw device pointers yourself.

nvcc is a compiler driver CUDA uses and is not a single compiler. It splits .cu into:

  • host code (compiled by your host compiler, like g++)
  • device code (compiled by NVIDIA’s device toolchain, PTX + SASS)

ATen is a C++ tensor library PyTorch and libtorch uses to manipulate tensors. Autograd is on top of ATen.

  • It provides at::Tensor, core tensor operations, device CPU/CUDA handling, and backend dispatch mechanism
  • #include <ATen/ATen.h>
  • In PyTorch/ATen extensions you usually work with at::Tensor on the host and pass raw pointers (tensor.data_ptr<T>()) into CUDA kernels; ATen handles CPU/CUDA dispatch and memory details.