The Cloud Exit: Cost, Security, and Performance Driving the Move Back to On-Premises

October 1, 2024

The last decade has seen a giant shift by organizations into the cloud for software, storage, and compute, resulting in business benefits ranging from flexibility and lower up-front costs to easier maintenance. But lately we have seen more and more companies re-evaluating their cloud strategies and opting to move their data back to on-premises infrastructure due to several key factors. 

According to a 2024 report by Citrix, a business unit of Cloud Software Group, a staggering  42% of U.S. organizations are considering or have already moved at least half of their cloud-based workloads back to on-premises infrastructure. This trend, often referred to as “cloud exit,” is driven by a combination of financial, security, and operational challenges that companies face in the cloud environment in conjunction with the need for better, more optimized control over infrastructure. Let’s take a look at each of them.

Financial Considerations

The cost of operating in the cloud can be unexpectedly high in the long term. While initial cloud migration often promises lower capital expenditures, the problem is that recurring costs for storage, data egress, and underutilized resources can quickly add up. These variable costs also need to be controlled, or else they can end up increasing beyond initial forecasts. 

Variable costs are not the only factor for the cloud exit; the GPU gold-rush has fuelled demand for accelerated cloud instances as companies try to secure powerful machines to train ever-growing neural networks and develop customized large language models. Such machines, if rented, can cost tens of thousands dollars each a month (While training LLMs may require several machines in parallel), and they usually require upfront reservations with annual commitments (which defeats the purpose of flexibility!).

As highlighted in the article about the Citrix report mentioned above, companies can achieve significant cost savings by transitioning to on-premises deployments. For example, Basecamp projected savings of $7 million over five years by shifting from cloud services to on-prem infrastructure, after previously spending $3.2 million annually on cloud solutions.

Security Concerns

As data is now considered the “new gold,” companies are less willing to put it outside of their own premises, thereby risking knowledge leaks, as well as loss of competitive edge and proprietary IP. Data breaches, such as the Capital One breach, have demonstrated that unless you store your own data, you don’t have truly bullet-proof access control. On-premises infrastructure requires security architects to make conscious decisions about the security architecture (instead of relying on the cloud’s default offering), which results in better control over data access and security protocols.

Customized Features

Some companies outgrow the capabilities of cloud providers, as their unique requirements may not be fully addressed by the broad, generic offerings available. Specialized hardware, tailored software stacks, and performance needs often drive businesses to explore migrating away from cloud infrastructure.

A notable example is Dropbox, once a major customer of Amazon’s S3 storage service. As Dropbox’s product matured and its engineering team gained the expertise to handle custom storage solutions, the company transitioned nearly all of its storage to on-premises infrastructure. They even developed custom hardware to meet their specific performance needs, achieving greater control and optimization.

Hybrid Cloud Approach

A growing number of enterprises are adopting a hybrid cloud model, combining on-premises and cloud infrastructure to balance control, security, and scalability. This provides business with a “best-of-both-worlds” approach. Sensitive data and critical systems stay in the business’ full control with on-prem machines, while agile and dynamic services and workloads may run on the cloud for added flexibility.

A key advantage of adopting a hybrid infrastructure is addressing “cloud spillover,” which occurs when on-premises capacity is fully utilized, but additional workloads need to be processed. In this scenario, excess workloads are shifted to cloud resources. This strategy allows organizations to keep costs low during periods of moderate demand while ensuring that additional compute power is available when needed. However, if cloud spillover is not carefully managed, costs can rise significantly—especially if low-priority projects are executed on cloud resources. To this end, ClearML offers autoscalers for AWS, Azure, and GCP cloud instances. Our autoscalers spin up instances when compute is needed and automatically shut them down after a predetermined idle period, eliminating the risk of paying for idle machines. To avoid unnecessary expenses, it’s essential to implement robust RBAC (Role-Based Access Control) and clear hardware resource policies to ensure that cloud resources are used efficiently.

Bridging the Cloud to On-prem Gap with ClearML

ClearML helps companies building AI products to bridge the gap between cloud and on prem in two main ways: infrastructure abstraction and native hybrid infrastructure support.

ClearML abstracts the compute infrastructure with our ClearML Agent, which sits on top of the compute hardware. Once a task arrives for execution, it builds a sandboxed execution environment, applies on-the-fly modifications to code and parameters, and then executes the task. This ensures that the workload uses the underlying hardware resources, but without having to know (or care) about the exact underlying compute.

This abstraction makes the task execution interface, from the AI builder’s point-of-view, the single interface needed, whether the workload runs on the cloud or on-prem. This is a huge advantage as AI builders can focus on doing what they do best – building AI products, rather than learning new infrastructure quirks.

As well, ClearML makes transitions from cloud to on-prem easier because there is no need for a large learning curve or tweaking code for different platforms. The frictionless experience ensures that transitions are smooth because the user experience is the same.

ClearML’s hybrid infrastructure support ensures that ClearML works for any deployment type. ClearML supports both the control plane and compute regardless of whether it’s in the cloud or on-prem, (including air-gapped systems) without compromising on any features or functionality. As a hardware-agnostic platform, IT teams are free to grow your computing power with any chip manufacturer. No matter where your workloads run, now or in the future, ClearML adapts to your needs and can be used in your unique deployment.

Navigating the Shift from Cloud to On-Prem with ClearML

As businesses continue to re-evaluate their cloud strategies, many are opting for a more balanced approach that leverages both on-premises infrastructure and cloud resources. As we have seen, financial pressures, security concerns, and performance reliability are prompting this shift. 

However, transitioning doesn’t have to be difficult. ClearML helps organizations seamlessly bridge the gap between cloud and on-prem environments by abstracting infrastructure and supporting hybrid deployments. With ClearML, companies can focus on building their AI products without worrying about the complexities of their underlying infrastructure, ensuring that they remain agile and future-proof, regardless of their deployment model.

ClearML offers a full AI Infrastructure Control Plane to assist organizations with the complexities of managing AI resources. To learn more about how ClearML can support your cloud exit and seamless transition to a hybrid compute infrastructure, please request a demo to speak to someone on our sales team.

Facebook
Twitter
LinkedIn
Scroll to Top