By Erez Schnaider, Technical Product Marketing Manager, ClearML
The GPU-as-a-Service market is experiencing hyper growth. Yet across telecommunications companies, cloud service providers (CSPs), and enterprise organizations, GPU infrastructure has been viewed as a necessary cost center rather than a strategic asset.
This perspective is changing as energy optimization technologies and multi-tenant capabilities transform GPU infrastructure into monetization engines and competitive differentiators.
The challenge spans all sectors: data center energy consumption is set to nearly triple by 2030, reaching 945 TWh globally according to the International Energy Agency with GPU-intensive AI workloads being the primary driver. For example, a 30,000 GPU facility’s annual electricity costs alone can reach $25.35 million, representing 30-40% of total operating costs.
This impacts telcos and CSPs seeking revenue diversification, as well as enterprises facing the high costs associated with the capital and operational expenses of scaling AI initiatives internally.
Meanwhile, EU Energy Efficiency Directive regulations mandate annual sustainability reports for data centers larger than 500kW.
The ClearML and EAR Joint Solution: Transforming Infrastructure Economics
ClearML and Energy Aware Runtime (EAR) have partnered to address this challenge with an integrated energy-optimized GPU-as-a-Service platform. This joint solution allows telecommunications companies, cloud service providers, and enterprises to transform their GPU infrastructure into a secure, energy-optimized, multi-tenant offering.
For telcos and CSPs, this generates new revenue streams while delivering customers a full-stack AI platform with secure, remote, on-demand access and scalable deployment capabilities while billing based on consumption of compute, storage, energy, and other metrics.
For enterprises, it enables internal teams and business units to access GPU resources efficiently while optimizing costs and energy consumption across the organization.The advanced cluster monitoring and GPU management features maximize utilization and serve as an ideal control plane for orchestrating high-performance workloads.
The combined solution of ClearML and EAR’s real-time energy optimization engine ensures workloads run with maximum GPU performance per watt, under power constraints or not, and includes per-task electricity and CO2 reporting alongside actionable hints to improve energy and resource utilization.
Market Opportunity Meets Operational Imperative
McKinsey estimates the addressable GPU-as-a-Service market for telecommunications companies could range from $35-$70 billion by 2030 globally. Leading operators are already capitalizing: NVIDIA’s 2024 telecommunications survey found that 49% of telcos are actively adopting AI, with 84% planning to offer AI services to customers. Those implementing AI report average annual revenue increases and operational cost reductions of 77% each.
ClearML with EAR directly addresses this opportunity by providing out-of-the-box GPU-as-a-Service with secure multi-tenancy and per-tenant billing for simplified invoicing. The platform includes silicon-agnostic support for NVIDIA, AMD, ARM, and Intel solutions, ensuring maximum flexibility for diverse infrastructure investments.
Energy Optimization Drives Up to 50% Performance Improvements
Energy optimization capabilities can deliver remarkable efficiency gains. EAR maximizes GPU performance per watt up to 50% under no power limit with GPU energy optimization, while also maximizing GPU performance under dynamic power limits with smart power capping features. The platform eliminates power peaks from variable workloads through intelligent power management.
The ClearML and EAR solution extends these benefits by providing detailed reporting of electricity savings per workload and per system, alongside per workload reporting of GPU resource utilization and performance metrics.
Real-time energy insights enable continuous optimization by tracking electricity consumption and CO2 emissions per user or task, providing hints to improve GPU resource utilization. This monitoring helps AI factories and HPC centers operating under energy or sustainability constraints maximize throughput within their power envelope.
Complete AI Infrastructure Platform for AI Builders with Multi-Tenant Security
ClearML’s platform supports secure multi-tenancy which delivers a complete AI development workbench supporting everything from data-centric workflows to model training and deployment into production for every tenant. Each tenant works within an isolated network with role-based access control (RBAC) with per-tenant SSO/IdP integration and integrations for LDAP and Active Directory.
Key platform capabilities include:
- Experiment management and visualization for collaborative AI development
- Model training lifecycle management with automated versioning
- Data catalog, versioning, vector database, and hyper-datasets for comprehensive data operations
- One-click AI service deployment supporting Model-as-a-Service, Machine Learning Operations, and agentic AI
- Cloud spillover capabilities for handling demand peaks
- Integrated orchestration with GPU optimization and management
This one-stop shop offering for AI builders includes MLOps, LLMOps, and GenAI with integrated orchestration, eliminating the operational complexity that often prevents organizations from scaling AI initiatives effectively.
Multi-Tenant Revenue Generation at Scale
Secure multi-tenancy capabilities enable telecommunications companies and cloud service providers to monetize GPU infrastructure investments immediately. With dynamic fractional GPUs, quota/over-quota, and resource allocation management dashboards and policies, providers can maximize cluster GPU utilization while ensuring fair resource distribution across tenants.
Billing and management features include:
- Per-tenant billing API for invoicing and chargeback support with usage details for computing time, data storage, API calls, and custom events
- Control plane that eliminates the need for direct Kubernetes access for users
- One-click self-serve compute with automated provisioning after initial Vault setup
The solution supports HPC environments through Slurm and PBS compatibility, enabling providers to serve both traditional high-performance computing workloads and modern AI applications through a unified platform.
Proven Performance with Leading Organizations
This ClearML and EAR solution builds on proven track records. ClearML is used by more than 2,100 customers worldwide to manage GPU clusters and optimize utilization, streamline AI/ML workflows, and deploy GenAI models effortlessly. The platform is trusted by more than 300,000 forward-thinking AI builders and IT teams at leading Fortune 500 companies, enterprises, academia, public sector agencies, and innovative startups worldwide.
EAR is installed at EDF (Électricité de France), European Space Agency (ESA), Eurofusion, and many national supercomputers in Germany, Spain and Netherlands. The solution is 100% dynamic, scalable and transparent, requiring no change to workloads, workflows or scripts, ensuring seamless integration with existing operations.
If you’d like to learn more, be sure to request a demo to speak with someone on our sales team.