From CUDA basics to deploying with TensorRT-LLM, NeMo, and NVIDIA NIM microservices.
Understand GPU threads, memory hierarchy, and CUDA kernels.
Optimise LLM inference with TensorRT-LLM for production throughput.
Fine-tune and customise foundation models with the NeMo toolkit.
Serve models at scale with NVIDIA Triton Inference Server.
Deploy optimised AI microservices with NVIDIA NIM containers.
Provision and manage cloud-based DGX infrastructure for large-scale training.
Join thousands of AI professionals. The week's most important stories, every Monday.