Maximize AMD Instinct™ GPU Utilization: ClearML’s Native Fractional GPU support for AMD Instinct™ GPUs Enable Concurrent Training, Fine-Tuning, and Inference for Multi-Workload Efficiency
GPU underutilization costs enterprises millions annually, with expensive accelerators frequently running single workloads at a fraction of their capacity. According to ClearML’s 2025-2026 State of AI Infrastructure at Scale report, almost half (49.2%) of IT leaders at F1000 companies identified maximizing GPU efficiency across existing hardware, including shared compute and fractional GPUs, as their top priority for expanding AI infrastructure over the next 12-18 months. GPU partitioning addresses this by allowing a single physical GPU to be divided into multiple isolated “virtual GPUs,” each capable of running workloads independently. For tasks that don’t need a full device, this approach can increase overall utilization dramatically, enabling up to 8× more concurrent workloads on the same hardware.
AMD Instinct GPU Partitioning
AMD Instinct MI300 Series GPUs, based on CDNA 3 GPU architecture, introduces new levels of flexibility in GPU resource allocation through compute and memory partitioning. At its core, the AMD Instinct MI300X GPU is built around modular chiplets called XCDs (Accelerator Complex Dies) and IODs (I/O Dies), arranged in a 3D-stacked design that combines up to eight XCDs and eight HBM memory stacks per GPU.
This chiplet-based architecture enables two complementary partitioning dimensions: compute partitioning (SPX, DPX, QPX and CPX) and memory-locality partitioning (NPS1, NPS2 and NPS4).
How Compute Partitioning Enables Concurrent Workloads
By default, each GPU is presented as a single logical device (SPX mode – Single Partition eXecution), allowing applications to use the full compute and memory resources as one unit.
In CPX (Compute Partition eXecution) mode, a GPU can be subdivided into 8 compute partitions, each consisting of a subset of XCDs. Each partition can run workloads concurrently and independently and is exposed to software as a separate virtual GPU. This logical isolation helps reduce contention between workloads.
Two additional modes are supported. DPX (Dual Partition eXecution) splits the GPU into 2 larger partitions, each allocated with more XCDs, while QPX (Quad Partition eXecution) divides it into 4 mid-sized partitions that balance concurrency with per-partition performance. Together, CPX, QPX, and DPX let operators choose between maximum parallelism, balanced slicing, or fewer high-performance partitions depending on workload needs.

Memory Partitioning for Better Locality and Throughput
AMD Instinct GPUs also support multiple memory-topology modes.
In NPS4 mode, the HBM memory stacks are divided into four NUMA domains. Each domain is physically close to two XCDs, ensuring that processes running on those XCDs primarily access nearby HBM stacks for lower latency and higher bandwidth. This improves memory locality and overall throughput in dense or multi-workload environments.
In contrast, NPS1 mode treats the entire HBM pool as a single, uniform memory space, which simplifies programming but sacrifices some of the locality benefits available in NPS4.
These capabilities mean that infrastructure teams can tailor the hardware to the actual workload demands – whether that’s a massive monolithic training task consuming the full GPU, or multiple smaller inference or fine-tuning jobs each taking a slice. The result is better resource utilization, higher density of workloads per card, and an easier path toward shared AI clusters. Check out the official AMD documentation on partitioning for more information.

Putting ClearML to the Test with AMD’s GPU Partitioning
To validate ClearML’s integration with the AMD InstinctTM GPU partitioning capabilities, we tested on an 8× MI300X node configured with Ubuntu 22.04 and ROCm 7.0.1. After installing the AMD Container Toolkit to enable containerized GPU access, we configured the GPUs for both compute and memory partitioning using AMD’s management utility:
# Enable compute partitioning
sudo amd-smi set --gpu all --compute-partition CPX
# Enable memory-locality partitioning
sudo amd-smi set --memory-partition NPS4

This configuration created 8 independent compute partitions per GPU, so a total of 64 GPU partitions.
Once the system was configured, we launched ClearML Agent in dynamic gpu mode:
clearml-agent daemon --docker --gpus all --dynamic-gpus --queue amd_gpu_partition_1x=1 –queue amd_gpu_partition_2x=2
In this configuration, ClearML Agent automatically manages GPU slices based on queue definitions and real-time resource availability. Workloads submitted to different queues are allocated the corresponding number of GPU partitions, enabling multiple workloads to run concurrently on the same MI300X while maintaining optimal locality and throughput.
To verify the configuration, we deployed a ClearML Jupyter instance on a single-partition queue. Inside the container, amd-smi correctly reported the GPU running in CPX and NPS4 modes, with visibility limited to a single partition. This confirmed that ClearML successfully scheduled the workload to an isolated GPU slice, leveraging partitioning features of AMD Instinct™ GPUs.

ClearML on AMD: Turning Hardware Flexibility into Operational Advantage
Partitioning of AMD InstinctTM GPUs unlocks significant hardware flexibility, but realizing its value requires intelligent orchestration, workload management, and infrastructure control. ClearML’s enterprise-grade AI infrastructure management platform makes AMD’s partitioning capabilities immediately operational, transforming raw GPU slices into a managed, multi-tenant compute environment with cloud-like accessibility.
Dynamic GPU Allocation at the Infrastructure Layer
ClearML’s Infrastructure Control Plane delivers seamless management of AMD’s partitioned GPUs through dynamic GPU partition allocation and silicon-agnostic workload scheduling. Users only need to configure their AMD GPU to the desired partitioning mode, and run an agent with visibility to the desired amount of GPU partitions. Agents can even be associated with multiple partitions to run a workload on them concurrently. Users then use ClearML’s user interface to enqueue workloads onto the respective queue and ClearML’s agent ensures complete environment reproduction and that the workload runs only on the desired partition and to prevent over-allocation. The platform enables granular quota management with per-tenant billing and isolated multi-tenancy across heterogeneous environments. Teams can mix full GPUs for large-scale training with fractional partitions for inference and fine-tuning. All workloads are scheduled intelligently based on real-time availability and queue priority. This works across on-premise, air-gapped, or cloud infrastructure, with support for hybrid multi-cluster orchestration and cloud bursting.
AI Development Without Infrastructure Complexity
The Infrastructure Control Plane from ClearML abstracts the complexity of partitioned GPU management, allowing data scientists and ML engineers to focus on model development rather than resource allocation. Through a cloud-native interface, teams deploy workloads to AMD GPU partitions with full experiment tracking, data versioning, and automated audit trails for compliance. The platform maintains data sovereignty through federated data management while simplifying Kubernetes complexity. Teams can even deploy Slurm-based jobs directly on Kubernetes infrastructure without refactoring existing workflows.
Production-Ready Model Deployment on AMD
The GenAI App Engine accelerates time-to-value by enabling self-service deployment of models onto AMD GPU partitions with a single click. Full automation of networking, authentication, and security configuration means teams can expose models through authenticated endpoints with enforced RBAC permissions. Centralized credential management and granular resource access controls ensure governance at scale, while real-time performance dashboards provide production monitoring across all partitioned workloads.
The Result: Maximize ROI from AMD Instinct GPUs
By pairing the hardware flexibility of AMD InstinctTM GPUs with ClearML’s orchestration and management capabilities, organizations can achieve higher GPU utilization rates, help reduce time-to-deployment for new workloads, and gain centralized visibility and control across their entire AI infrastructure. All of this is delivered through a single, vendor-agnostic platform that scales from lab to production.