Case study

How Volkswagen’s Machine Learning Research Lab Uses ClearML to Increase Productivity and Decrease Time to Prototype

June 4, 2024

Client Overview

We recently caught up with Dr. Philip Becker-Ehmck, Research Scientist at the Machine Learning Research Lab, Volkswagen Group, to discover how ClearML has improved their MLOps workflow.

The Volkswagen Group is one of the world’s leading car makers, headquartered in Wolfsburg, Germany. It operates globally, with 114 production facilities in 19 European countries and ten countries in the Americas, Asia and Africa. The Group comprises ten brands from five European countries: Volkswagen, Volkswagen Commercial Vehicles, ŠKODA, SEAT, CUPRA, Audi, Lamborghini, Bentley, Porsche, and Ducati. In addition, the Volkswagen Group offers a wide range of further brands and business units including financial services.

Dr. Philip Becker-Ehmck Research Scientist Volkswagen Machine Learning Research Lab

Dr. Philip Becker-Ehmck

Research Scientist | Volkswagen Machine Learning Research Lab

The Challenge

Before ClearML, the Volkswagen Machine Learning Research Lab looked for a way to increase productivity and decrease time to prototype. They sought to minimize time spent on infrastructure and experiment setup and management for researchers. They also needed to deploy workloads to internal compute clusters to use existing resources more efficiently. 

The problem was, they didn’t have a single end-to-end platform to automate and orchestrate the entire machine learning lifecycle. That meant a few things: they had no way to compare experiment results with ease, there was no option for workload prioritization, and they had what Dr. Becker-Ehmck calls a “non-negligible administration overhead,” meaning they needed to extend open-source solutions themselves to fit their needs.

To increase team productivity, it was important to gain a singular view into their training, models, and data – including metrics and logging. They sought to replace their full MLOps stack for orchestration, experiment management, dataops, and model management with a simple enough interface so their researchers could work without major technical support.

“For a machine learning researcher, a system that covers all these needs is the basis for their work. We also wanted to enable our users to easily access the compute resources interactively, and needed support for different queues and prioritization for very heterogeneous workloads,” said Dr. Becker-Ehmck. “In addition, we looked for seamless integration with market-leading machine learning frameworks as well as integrated user management, including Role-Based Access Control to maintain compliance.”

The team came across ClearML when actively scouting for MLOps solutions on https://mymlops.com (which they found on the MachineLearning subreddit).

“We chose ClearML because it gave us an all-in-one solution that suited our needs the most while being intuitively usable,” Dr. Becker-Emhck said. “Usability by researchers and ML engineers without Kubernetes knowledge was an absolute priority. In particular, advanced workflows like hyperparameter searches beyond random search and multi-step workflows needed to be manageable.”

He noted that ClearML provides all the tools users need to perform their jobs efficiently. “As well, we’ve seen a lot of systems come and go in the MLOps space, so we really appreciated the maturity of the platform and the community support and adoption it has. Lastly, we thought the pricing was very competitive, and that helped seal the deal for us.”

 

The Solution

When thinking through the Lab’s use of ClearML, Dr. Becker-Ehmck noted ClearML Orchestration and Experiment Management as the core features they use, as well as dataset and model management for sharing artifacts within the team. It’s worth noting that ClearML recently added extensive new orchestration and scheduling capabilities for optimizing control of AI & ML.

He said that Hyper-parameter search, especially with advanced optimization schemes (Optuna), is obviously essential to our research work and also for applying the research in business cases. We also integrate extensively with other ML frameworks such as Hydra, Pytorch, and Tensorboard and are extensively using the ClearML Python SDK for automation and custom workflows. Lastly, we are using ClearML for CI/testing for internal machine learning libraries.”

“It’s been great to have a joint platform for sharing information and artifacts. Experiments, Models, Metrics, and data are a big deal for us. We are now able to fully automate certain workflows much more easily for continuous delivery or for evaluation of new methodologies,” he said. “We also like that our existing projects didn’t need a lot of changes to work with ClearML.”

The integration of ClearML with the Lab’s K8s cluster was mostly seamless, with ClearML’s support via Slack. “Given ClearML is quite multi-faceted and in certain ways quite different from our previous solution, it required some re-learning and a little bit of a change in mindset,” he said. “To address that, we gave some hands-on tutorials for end users. In particular, we recommended specific best practices, and the company’s examples and documentation was incredibly helpful for that.”

The Results

In adopting ClearML, Dr. Becker-Ehmck said that the Lab was able to deprecate its heterogeneous MLOps stack, minimizing the administration work needed to enable rapid AI and ML research and prototyping. 

Moreover, the team gained exactly what they were looking for — clear user and project-based separation with RBAC simplified compliance and data handling guidelines. “We were also able to achieve a high cluster utilization, maximizing the use of our compute infrastructure,” he said.

In general, the Lab has reduced the maintenance time needed for its MLOps stack and gained fast and helpful support that was not available with its previous solution. “Lastly, we were able to implement some custom features that were essential to us,” Dr. Becker-Ehmck concluded. 

Next Steps

Get started with ClearML by using our free tier servers or by hosting your own. Read our documentation. You’ll find more in-depth tutorials about ClearML on our YouTube channel and we also have a very active Slack channel for anyone that needs help. 

If you need to scale your ML pipelines and data abstraction, need unmatched performance and control, or want access to Hyper-Datasets, Role-Based Access Control, SSO and LDAP integration and other features, please request a demo. To learn more about ClearML, please visit: https://clear.ml/.

Facebook
Twitter
LinkedIn