AI & GPU Scheduling
Key Takeaways for AI & Readers
- Device Plugins for GPUs: Standard Kubernetes doesn't recognize GPUs; a vendor-specific Device Plugin (e.g., NVIDIA's) is required to expose them as schedulable resources.
- GPU Request Syntax: Once the device plugin is installed, Pods can request GPUs using specific resource limits like
nvidia.com/gpu: 1. - Multi-Instance GPU (MIG): MIG enables the partitioning of a single physical GPU into multiple smaller, isolated virtual GPUs, improving utilization for smaller AI workloads.
- AI Framework Integration: Kubernetes serves as a robust platform for AI/ML workloads, supporting frameworks like Kubeflow and Ray for distributed training and inference.
With the rise of Large Language Models (LLMs), Kubernetes is increasingly used as an AI training and inference platform. However, standard Kubernetes only knows about CPU and RAM. To use GPUs, we need Device Plugins.
1. Requesting Hardware
Visualize an AI Pod requesting a physical GPU from the underlying node.
AI Workload Pod
🧠
PyTorch / LLM
Physical Node GPU
NVIDIA H100
Kubernetes uses Device Plugins to expose specialized hardware like GPUs to containers. The scheduler ensures only nodes with available GPUs are selected.
2. NVIDIA Device Plugin
To enable GPUs, you must install the vendor's device plugin. This plugin:
- Discovers GPUs on the node.
- Reports them to the Kubelet.
- Mounts the correct drivers into the container when scheduled.
YAML Request
resources:
limits:
nvidia.com/gpu: 1 # Request 1 full GPU
3. Multi-Instance GPU (MIG)
Modern GPUs (like the A100/H100) are massive. Running a small model on a whole H100 is wasteful.
- MIG allows you to partition 1 physical GPU into up to 7 smaller "virtual" GPUs.
- Kubernetes can then schedule 7 different pods on 1 physical card.
4. AI Frameworks
- Kubeflow: A platform for the entire ML lifecycle (Notebooks, Pipelines, Training).
- Ray on K8s: A popular framework for distributed Python/AI applications.