By Yuval Gabay, MLOps Team Lead
Introduction
At WSC Sports, we leverage advanced machine learning to automatically generate personalized sports highlights. This requires robust and scalable training platforms. Our solution involves ClearML for experiment tracking and management, Kubernetes for orchestration, and ArgoCD for GitOps-based deployments. This powerful combination enables us to train models at scale using cloud resources, execute parallel experiments, and manage ML pipelines efficiently.

The Challenge of Scaling Machine Learning
Training complex machine learning models demands significant computational resources and often involves running numerous experiments with varying configurations. Managing these experiments, tracking results, and ensuring reproducibility can become challenging as the scale increases. We needed a solution that could:
- Provide a centralized platform for experiment tracking.
- Offer scalability to handle large datasets and complex models.
- Enable parallel execution of experiments.
- Streamline the deployment and management of training environments.
Our Setup: ClearML, Kubernetes, and ArgoCD
To address these challenges, we implemented a solution that integrates ClearML, Kubernetes, and ArgoCD.

ClearML: Experiment Tracking and Management
ClearML serves as our central hub for managing all machine learning experiments. It allows us to:
- Track every aspect of our experiments, including code, configurations, and results.
- Compare different runs and identify the best performing models.
- Reproduce experiments with ease.
- Manage datasets and artifacts.
Editor’s Note: WSC Sports uses ClearML’s AI Development Center, a complete solution for managing the AI lifecycle. Whether customers are building data-centric workflows, training models, or deploying them into production, ClearML provides a unified, open-source platform designed for flexibility, scalability, and efficiency. Streamline every stage of AI development. Fully cloud- and vendor-agnostic, the open source architecture supports seamless integration with existing infrastructure and tools, ensuring that teams can focus on innovation without operational roadblocks.
Kubernetes: Orchestration and Scalability
Kubernetes provides the orchestration layer for our training infrastructure. It enables us to:
- Deploy and manage ClearML agents in a scalable manner.
- Dynamically allocate resources based on the demands of our experiments.
- Run experiments in parallel across multiple nodes.
- Ensure high availability and fault tolerance.
ArgoCD: GitOps for Deployment and Configuration
ArgoCD automates the deployment and configuration of our Kubernetes environment using the GitOps methodology. This means:
- All infrastructure and application configurations are stored in Git.
- ArgoCD automatically synchronizes the desired state from Git to the Kubernetes cluster.
- Changes are made through Git pull requests, providing a clear audit trail and version control.
- Self-service capabilities for teams to add various instance types.
Benefits of Our Approach
Our integrated solution offers several significant benefits:
- Scalability: Kubernetes allows us to easily scale our training infrastructure to meet the demands of our machine learning workloads.
- Parallel Execution: We can run multiple experiments in parallel, significantly reducing training time.
- Organization: ClearML provides a centralized platform for tracking and managing all experiments, ensuring organization and reproducibility.
- Self-Service: ArgoCD enables a self-service model, allowing teams to add different instance types and manage their environments efficiently.
- GitOps: Managing our infrastructure through GitOps ensures consistency, auditability, and ease of deployment.
- Cloud Resource Utilization: We can effectively utilize cloud resources, optimizing costs and performance.
Implementing ML Pipelines
Integrating ML Pipelines into this setup is seamless. ClearML’s pipeline functionality allows us to define and execute complex workflows, while Kubernetes ensures the scalability and reliability of these pipelines. ArgoCD manages the deployment of any pipeline-specific resources and configurations.

Conclusion
By combining ClearML, Kubernetes, and ArgoCD, WSC Sports has created a powerful and scalable machine learning platform. This solution enables us to efficiently train models, manage experiments, and deploy infrastructure at scale, ultimately driving innovation in our automated sports highlight generation. The GitOps approach with ArgoCD ensures a self-service and organized environment, empowering our teams to work efficiently and effectively.
Editor’s Note: If you’d like to see the power of ClearML in streamlining AI & ML workflows (as well as managing GPU clusters, optimizing utilization, and deploying GenAI models effortlessly), please request a demo.