Case study

How Incode Streamlines GPU Orchestration and Optimizes Compute with ClearML

January 6, 2025

We recently spoke with Alex Golunov, VP of ML Engineering at Incode, about the company’s use of ClearML. Here’s what he had to say. 

About Incode

Incode is a leading provider of world-class identity verification solutions that combines cutting-edge AI and machine learning technology to help businesses distinguish good users from fraudulent ones. At the core of Incode’s technology are more than 100 proprietary machine learning models that authenticate each onboarding session.

Addressing Growing Pains

In order to do that, the team grew over time; however, as the team increased in size, they faced significant challenges in managing their growing datasets, which included both training and testing data. The complexity extended to hardware management, particularly GPU resources distributed across cloud providers and bare metal machines, which needed efficient orchestration to handle the computational demands of model training. Additionally, the team required robust experiment tracking capabilities to monitor and compare different model iterations, along with a reliable system for remote execution of training tasks. Without a centralized MLOps platform, they struggled with issues such as unreliable testing, repetitive work, and difficulties in coordinating resources effectively. The need to scale their infrastructure while maintaining data quality and experimental reproducibility became increasingly critical as their datasets grew into the terabyte range. That’s when they turned to ClearML.

GPU Orchestration with ClearML

ClearML provides an easy-to-use but still very flexible and efficient GPU orchestration layer with ClearML agents. These agents can be deployed across different machines and cloud providers, automatically picking up queued experiments and executing them on available GPU resources. The system supports dynamic resource allocation, queue management, and priority settings, making it ideal for teams working with distributed computing resources.

Dynamic GPU Allocation

“One of the most important features for us was Dynamic GPU Allocation, available under the ClearML Enterprise plan,” said Alex Golunov, VP of ML Engineering at Incode. “This feature lets you connect the agent to multiple queues, each configured to run with specific resource allocations—an essential capability for multi-GPU nodes.” Here’s an example of Incode’s configuration:

clearml-agent daemon \\
    --detached \\
    --docker \\
    --dynamic-gpus \\
    --queue onprem.1xA100=1 onprem.2xA100=2 onprem.4xA100=4 \\
    --gpus 0,1,2,3,4,5,6,7

AWS Autoscaler

Another important feature for Incode is AWS Autoscaler. This powerful automation tool enables dynamic resource management by automatically provisioning and terminating EC2 instances based on workload demands, all while staying within a user-defined resource budget. The autoscaling functionality intelligently monitors queue lengths and resource utilization, ensuring optimal cost efficiency by scaling computing resources up or down as needed.

Incode Architecture

Let’s examine Incode’s current high-level architecture, which illustrates the comprehensive integration of its systems and showcases how different components interact to create a robust and scalable infrastructure:

ClearML and Incode Architecture

Data Management

When it comes to efficient collaboration and experiment reproducibility, robust data management becomes absolutely crucial for maintaining organized workflows and ensuring consistent results across teams. “At Incode, we face the complex challenge of managing and coordinating an extensive collection of datasets, encompassing hundreds of distinct data collections that contain hundreds of millions of individual data points,” said Alex Golunov, VP of ML Engineering at Incode. “These diverse data points span multiple formats and types, including high-resolution images, videos, structured and unstructured text documents, and comprehensive tabular datasets. The total volume of our data infrastructure has reached a significant milestone, currently exceeding 10 terabytes and continuing to grow. To effectively manage, organize, and maintain this substantial amount of data while ensuring seamless access and version control, we are using the sophisticated Hyperdatasets solution, an enterprise-grade data management system provided by ClearML under their Enterprise plan.”

Incode’s Experience with ClearML Hyperdatasets:

  • Through implementing version control in Hyperdatasets, they’ve been able to track all changes in their datasets and quickly revert to previous versions when needed, which has been crucial for their iterative development process
  • The metadata management capabilities have helped them maintain detailed records of their dataset properties and annotations, making it easier for the team to understand and utilize different data collections
  • With their growing team, the granular access control has been essential in managing who can access and modify specific datasets, ensuring data security and integrity
  • Incode frequently uses the advanced querying features to filter their massive datasets, which has significantly improved their workflow efficiency when working with specific data subsets
  • The seamless integration with their existing ML pipelines has saved them considerable development time and reduced potential integration issues

Building An Advanced Data Pipeline

One of the cornerstone features offered by Incode’s comprehensive identity verification platform is sophisticated document reading and analysis. Throughout years of dedicated research and development, they have meticulously engineered and refined a state-of-the-art, market-leading automated solution that seamlessly integrates multiple advanced components, including precise document detection, accurate classification, and highly reliable Optical Character Recognition (OCR). This sophisticated technological pipeline, which represents the culmination of extensive engineering efforts, requires constant refinement and meticulous maintenance to maintain its superior performance levels. The effectiveness of such a complex system is fundamentally dependent on exceptional data quality and continuous data enhancement and refinement. Through the implementation of ClearML Hyperdatasets, Incode has successfully established a robust, highly scalable infrastructure for data collection and labeling. This sophisticated pipeline serves as the foundational element that generates the high-quality training data essential for powering and continuously improving their machine learning models.

The pipeline contains the following components:

  1. Data Lake:
    • The process begins with data stored in a Data Lake. This is where the initial images along with initial model predictions are kept.
  2. Hyper Dataset:
    • The data from the Data Lake is transferred to the Hyperdataset. This step involves organizing and preparing the data for further processing, focusing on creating a structured dataset.
  3. Bounding Box Labelling:
    • Images from the Hyper dataset are sent to the Bounding box labelling process. Here, each image is analyzed to identify specific areas of interest, which are marked with bounding boxes. The classes of the objects contained within these boxes are also identified. This helps in isolating parts of the image that contain text or other crucial information.
  4. Text Snippet Creation:
    • From the bounding box-labelled images, snippets of text are created. This involves extracting and cropping the specific portions of images that contain text inside the bounding boxes.
  5. Text Annotation Labelling:
    • The text snippets are manually reviewed and annotated. In this process, humans interpret the text within the snippets and create accurate annotations. These annotations are essential for training and validation in machine learning models focused on text recognition.

Advanced Data Pipeline

At the end of the process, the result is well-structured data organized into Framegroups that is later used for training multiple ML models.

Framegroups that is later used for training multiple ML models.

Conclusion

The implementation of ClearML has been transformative for Incode’s ML operations. Through its comprehensive suite of tools, including GPU orchestration, AWS Autoscaler, and Hyperdatasets, the company has achieved significant improvements in their workflow efficiency and scalability. The platform has enabled Incode to:

  • Effectively manage and distribute GPU resources across their infrastructure, optimizing resource utilization and reducing computational bottlenecks
  • Automatically scale their AWS infrastructure based on demand, ensuring cost-effectiveness while maintaining high performance
  • Successfully handle their extensive datasets, which now exceed 10 terabytes, with robust version control and metadata management
  • Streamline their ML pipeline processes, from data collection to model training and deployment

The adoption of ClearML has not only solved their initial challenges but has also positioned them to handle future growth and complexity in our ML operations. The platform’s enterprise-grade features have proven invaluable in maintaining our position as a leading provider of identity verification solutions.

If you’d like to learn more about how ClearML can help streamline your organization’s machine learning workflows, please request a demo.

Facebook
Twitter
LinkedIn
Scroll to Top