Maximizing GPU Utilization with ClearML’s Dynamic Fractional GPUs: Unleashing the Full Power of Your AI Infrastructure

July 1, 2025

By Erez Schnaider, Technical Product Marketing Manager, ClearML

Dynamic Fractional GPUs by ClearML: More Throughput, Less Waste

In the world of AI, GPUs have become the undisputed workhorses of innovation. From training deep learning models to accelerating agentic workflows, digital twins,  and scientific simulations, these powerful accelerators are indispensable. However, the immense computational power of GPUs comes with a significant investment. Organizations worldwide are grappling with a pervasive challenge: underutilization of their valuable GPU resources, which directly translates into inflated operational costs and stifled AI progress.

It’s a common scenario. A data science team kicks off a training run, consuming an entire GPU for a task that might only truly leverage a fraction of its capacity. Meanwhile, other AI builder teams or individual researchers wait in line, their own projects stalled, even as expensive hardware sits partially idle. This inefficiency isn’t just a self-serve inconvenience; it’s a drain on budgets, a bottleneck to innovation, and a major headache for IT and AI/ML teams striving for optimized infrastructure with effortless management, control, and access.

At ClearML, we hear these pain points from teams in the field every day; they come up in almost every conversation we have with customers. Our mission is to empower organizations to build, deploy, and scale their AI initiatives with unparalleled ease and efficiency. That’s why we’re excited to highlight a cornerstone of our AI Infrastructure Control Plane: ClearML’s Dynamic Fractional GPU capabilities. This innovative feature is specifically designed to address the challenge of GPU underutilization head-on, squeezing the most out of each GPU and fractioning GPUs for maximum utilization, allowing multiple workloads to intelligently share GPU resources, thereby unlocking their full potential at any scale or environment.

The ClearML Solution: Dynamic Fractional GPUs

Imagine a world where your expensive GPU assets are no longer monopolized by a single task, but instead dynamically allocated to precisely meet the needs of as many workloads as it can handle, in real-time and cost-effectively. This is the promise of ClearML’s dynamic fractional GPUs.

Traditionally, achieving efficient GPU sharing has been fraught with complexities. Approaches like statically partitioning GPUs or pre-configuring containers with strict memory limits often lead to either under-provisioning (where a task is starved of resources) or over-provisioning (where resources sit idle). Changing limits requires you to take the machines offline for re-configuration. All in all, it’s detrimental to efficiency and cost-effectiveness.

ClearML’s dynamic fractional GPUs provide an on-the-fly approach to GPU slicing. It eliminates the need for container memory visibility setup or pre-configuring MIG profiles. Instead, when configuring the ClearML Agent, you can specify what fractions of the GPU are available. Unlike alternative solutions, this supports both MIG and non-MIG GPUs. ClearML takes care of the rest, allocating only the required GPU fraction to the workload.

When using a MIG GPU, it will reconfigure it (without evicting all workloads) and run the task on the appropriate fraction, while on non-MIG GPUs, it will limit the available memory for the container, and time-slice the GPU processing power between all workloads running on the GPU. ClearML enforces memory limits, so even on non-MIG enabled GPUs, a container won’t be exposed to a larger GPU fraction than allocated.

Figure 1: ClearML Fractional GPUs
Figure 1: ClearML Fractional GPUs

How It Works: Effortless Efficiency at Your Fingertips

The simplicity of ClearML’s dynamic fractional GPUs is one of its most compelling aspects. Here’s a closer look at the “how:”

  1. Queue-Based Allocation: Queues are the interface through which users send jobs for execution; each queue signifies a GPU portion to be allocated.
  2. MIG-enabled GPUs: When ClearML manages MIG-enabled GPUs (such as A100 and H100), ClearML reconfigures the MIG profile (without stopping all running workloads) and runs the task on the requested fraction. ClearML handles bin-packing so that GPUs are filled whenever possible.
  3. Non-MIG GPUs: ClearML limits the available memory for the container and time-slices the GPU processing power between all workloads running on the GPU.
  4. Runs on any infrastructure: Whether running on bare-metal, inside a VM, or inside a Kubernetes cluster, ClearML can fractionalize managed GPUs.
  5. Built-in Scheduling: ClearML’s Fractional GPU works with its scheduler. The ClearML scheduler takes into account fractions of a GPU for user quotas and over-quotas.

The Tangible Outcomes: What ClearML Delivers

The benefits of implementing ClearML’s dynamic fractional GPUs are both immediate and substantial:

  • Enhanced GPU Utilization: This is the primary and most impactful outcome. By allowing multiple workloads to share a single GPU, you dramatically increase the active processing time of your accelerators. No longer will GPUs sit idle or underutilized because a single, less-demanding task has claimed the entire resource. This means more compute cycles for your AI initiatives, accelerating research, development, and deployment.
  • Reduced Idle Times: The “first-come, first-served” bottleneck is a thing of the past. With dynamic fractional GPUs, multiple tasks can run concurrently, significantly reducing the waiting times for your data scientists and engineers. This fosters a more agile and productive environment, allowing teams to iterate faster and bring models to production quicker.
  • Significant Cost Savings in AI Operations: This is where the rubber meets the road for IT and finance departments. By optimizing GPU utilization, you effectively get more computational power out of your existing hardware. This translates directly into:
    • Delayed Hardware Purchases: You can postpone or even avoid the need to purchase additional expensive GPUs, as your current infrastructure is operating at peak efficiency.
    • Optimized Cloud Spend: For those leveraging cloud-based GPU instances, fractional GPUs mean you can get more out of your cloud resources. This can lead to substantial reductions in your monthly cloud bills.
    • Lower Energy Consumption: More efficient use of hardware also means less wasted energy, contributing to both environmental sustainability and reduced operational expenditures.

Realizing the Promise with ClearML

The journey to maximizing GPU utilization no longer needs to be a daunting one. ClearML’s Infrastructure Control Plane, with its powerful dynamic fractional GPU capabilities, offers a clear path to enhanced efficiency, significant cost savings, and accelerated AI innovation. It’s about making your valuable GPU resources work harder and smarter for you, ensuring that every dollar invested in compute power delivers maximum impact.

Stop letting your GPUs sit idle. Start unlocking their full potential. To learn more about how ClearML’s dynamic fractional GPUs can transform your AI operations, visit our documentation: https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus/#dynamic-gpu-fractions or request a demo to see it in action!

Facebook
Twitter
LinkedIn
Scroll to Top