With all of the excitement around Arm’s high performance processors, which are optimized for AI, our team wanted to test how easily ClearML would work with GPUs paired with Arm-based CPU compute when compared to GPUs combined with x86 chips.
We decided to run a project on AWS using a Graviton-based EC2 instance, and we chose the AWS Graviton2 processor, which is paired with NVIDIA’s T4G Tensor core GPUs for efficient inference. The Graviton2 chip is designed around Arm’s 64-bit Neoverse N1 CPU, an Arm platform designed specifically for efficient high-performance cloud computing workloads. By successfully orchestrating ClearML workloads using AWS Graviton™, we can confidently confirm that AI builders can frictionlessly take advantage of a more economical AWS computing option.
Testing ClearML on an AWS Graviton with an Arm-based CPU
In designing our experiment, we decided to run some model training on an EC2 G5g machine on ClearML’s AI Platform using our own AWS autoscaler. To set up our instance, we found the right Amazon Machine Image (AMI) that matched the AWS instance type and established a budget (just as we would have to do for any other x86 AWS instance).
With the compute and orchestration set up, we ran our training job. It worked flawlessly without any hiccups. Even though this was what we expected, it still felt great to see confirmation that ClearML works on any hardware. Proof positive that AI builders can frictionlessly run jobs on less-expensive AWS Graviton instances using Arm-based CPUs!
Behind-the-Scenes: the Architecture that Enables the Magic
Although it works like magic, quite a lot went on behind the scenes to make the process seamless. Once a user sets up the orchestration and instance, ClearML’s silicon-agnostic design automates the rest by matching the container and bringing in the AI frameworks needed to support the hardware in run time. NVIDIA offers Arm+GPU software solutions similar to its x86+GPU offerings. This covers all the CPU optimizations Arm has added to machine learning frameworks, including PyTorch. These improvements use Arm Kleidi technology available in the Arm Compute Library and Arm KleidiAI Library.
On ClearML, AI builders have complete visibility over their entire AI workflow. In addition to monitoring the training job as it happens, ClearML brings the codebase and data into the container and monitors the performance of the instance itself in real time, displaying metrics, GPU utilization, CPU utilization, and network monitoring statistics.
The Benefits of Using ClearML + AWS Graviton with Arm-based CPUs
With ClearML, AI builders benefit from a silicon-agnostic platform that works on any CPU or GPU and can take advantage of more efficient computing options that are better optimized for their specific needs. AWS Graviton processors have been designed to deliver high performance at a lower price, and AI teams using Graviton EC2 instances can save up to 20% when compared to x86-based Amazon EC2 instances. In addition, Graviton processors use up to 60% less energy than comparable EC2 instances on other architectures.
Closing
As we’ve experienced ourselves, there are no barriers to using AWS’ more cost-effective computing. The experience of utilizing Graviton compute is exactly the same as running on an x86, and holds true for any other Arm processor as well.
To learn more about ClearML, please request a demo to speak with our sales team. For more information about deploying AI on Arm CPUs, learn more about Arm Kleidi.
Stay tuned for our next Arm + ClearML blog about our experience running a model utilizing an Arm-based machine using llama.cpp.