By Adam Wolf and Damian Erangey
This blog covers how ClearML’s compute governance layer (resource pools, profiles, and policies) gives every team fair, prioritized access to shared infrastructure without leaving hardware idle. It accompanies our Enterprise AI Infrastructure Security YouTube series. Watch the corresponding video below.
The Problem Compute Governance Solves
At this point in the series, your users are authenticated, your configuration is governed by vaults, and your automated workloads have service account identities. The security perimeter is solid. But security alone doesn’t answer a different, very practical question: when the GenAI team, the Dev team, and the Research team all submit jobs to the same GPU cluster, who goes first? How do you stop one team’s fine-tuning run from monopolizing the cluster? How do you guarantee that production inference always has capacity, regardless of what training is happening?
ClearML governs compute through three constructs that work together: resource pools, resource profiles, and resource policies. Understanding how they interact is the key to building a compute layer that’s both fair and responsive to changing business priorities.
The Three Constructs

Resource Pools
A resource pool represents your physical infrastructure, e.g., a GPU cluster, bare metal servers, a cloud autoscaler, etc. Each pool has a defined number of available resources, and the policy manager ensures workload assignment never exceeds what’s available. Administrators also control execution priority within a pool, determining which profile’s jobs get resources first when multiple profiles compete for the same hardware.
Resource Profiles
A resource profile defines the resource consumption requirements of a job (how many GPUs a single task needs). Profiles are the interface through which administrators give users access to hardware. A team might see a half-GPU profile for lightweight experiments, a 4xGPU profile for standard training, and an 8xGPU profile for LLM fine-tuning. Each profile can be connected to multiple resource pools with priority ordering, so jobs try your on-prem cluster first and only burst to cloud if local capacity is full. Pool priority is set on the profile: the first pool that can satisfy the request is where the job runs.
Profiles aren’t limited to whole GPUs. ClearML supports fractional GPU allocation (a 0.5 profile, a 0.25 profile, and so forth) so you can slice hardware more efficiently and give teams access at a granularity that matches their actual workloads. Dynamic GPU slicing is available under the Enterprise plan and works on bare metal, VMs, and Kubernetes, across both MIG and non-MIG devices. Fractions are allocated per task at the queue level, with memory limits enforced at the driver level so containers sharing a GPU are fully isolated from each other.
Quotas, Reservations, and Priority

Two terms are worth clarifying carefully, because they don’t mean what you might expect.
A limit (what the UI calls a quota) is a ceiling – the maximum number of resources a group can consume concurrently. If a team has a limit of 10 GPUs, their 11th job waits in the queue even if other GPUs on the cluster are sitting idle. The limit applies regardless of wider availability.
A reservation is about priority, not dedicated idle capacity. A reservation of 4 GPUs doesn’t mean 4 GPUs are being held back. It means that when GPUs become available, members of this group get higher scheduling priority over groups without a reservation. Their work goes to the front of the line.
Priority operates at two levels beyond reservations:
- Policy priority: within a resource profile, multiple policies can compete for the same hardware. You rank them: the GenAI team at priority one, the Dev team at priority two. When both have jobs waiting, GenAI’s jobs are submitted to the pool first.
- Pool priority: within a profile, you set the order of pools. Jobs route to the first pool that can satisfy the request. On-prem first, cloud only when on-prem is full. Cost stays controlled by default.
When a higher-priority job arrives and resources are tight, over-quota jobs from lower-priority groups can be preempted. But preemption is graceful: users can register an abort callback, a Python or bash script that runs when the job is being stopped, giving it time to save a checkpoint or log its progress. When the task is automatically rescheduled, it picks up from that saved state. The work is deferred, not lost.
Dynamic vs. Static GPU Allocation
Traditional GPU sharing approaches require pre-partitioning hardware upfront, statically slicing GPUs before any jobs run and locking those slices in place. Changing the allocation means taking machines offline.
ClearML’s fractional GPU technology works differently. Fractions are allocated on the fly, per task, at the queue level. You define the fraction in the resource profile, and ClearML handles the slicing dynamically when the task runs. Memory limits are enforced at the driver level, so workloads sharing a GPU are fully isolated; one container can’t see or consume another’s allocation. This works across MIG and non-MIG GPUs on bare metal, VMs, and Kubernetes.
Static carve-outs are also available when the business requires them. IT can dedicate specific hardware exclusively to a particular group, physically assigned and exclusively theirs. This is useful for compliance, regulated workloads, or teams that need guaranteed exclusive access to specific hardware. Both models are available; most teams use dynamic sharing for efficiency and static dedication for compliance-critical workloads.

How It All Connects
The compute layer doesn’t operate in isolation; it builds directly on the layers configured in previous videos:
- Access rules ensure only the right groups can submit to the right queues. When a resource policy creates a queue, an access rule is automatically created scoping that queue to the policy’s user group.
- Administrator vaults govern what containers and storage configurations those tasks run with.
- Service accounts handle agent identity, so automated pipelines submit to the right queue under a governed identity.
A newly provisioned team’s full setup (compute access, container policy, credential governance, and queue permissions) flows from group membership alone. Add the user to the right group, and the platform does the rest.
GenAI-Specific Considerations
GenAI workloads create specific resource management pressures worth calling out separately:
- Production inference needs consistent, low-latency GPU access. Reservations guarantee serving endpoints always have capacity, regardless of what training is happening. Combined with queue access rules, only the serving service account can submit to the production inference queue.
- LLM fine-tuning is resource-intensive and long-running; a single job may need 8 GPUs for days. Without quotas, one team’s fine-tuning run can consume the entire cluster. A dedicated fine-tuning profile with per-team limits prevents this while still enabling the work.
- Experimentation workloads need to be bounded. A lower-priority profile with reasonable limits lets teams iterate freely without risking production stability.
Visibility: The Orchestration Dashboard
The Orchestration Dashboard gives real-time visibility into everything configured above: current resource usage per pool, pending and running jobs per profile, and policy utilization per group. It tracks GPU and CPU usage, idle worker counts, and resource utilization over time, down to individual workers and groups. When someone asks about GPU usage this month, you can answer with specific data (which teams, which profiles, which pools) in seconds.

Closing
Resource pools, profiles, and policies give you a compute governance layer that adapts as your business changes. Teams can move fast without waiting for approvals — they operate within guardrails that reflect your actual business priorities. Business units need higher priority for a critical project this month? Adjust the policy priority. Bursting to cloud while upgrading on-prem hardware? The pool priority handles the routing. A new team spins up and needs governed access to shared GPUs? Create a policy, connect a profile, assign a group; the queue appears, the access rule is created automatically, and the governance follows.
No single layer does everything. Identity, configuration, automation, compute, and access work together, each one reinforcing the others. That’s what makes the platform governable without becoming a bottleneck.
Learn More
Find the full Enterprise AI Security video series on YouTube. Get in touch if you would like to discuss how ClearML can support your organization’s AI infrastructure and compute governance requirements.