Case study

Leveraging ClearML Tasks and Hyperdatasets for Efficient Camera Trap Data Management and Analysis

May 13, 2025

By Jennifer Zhuge, Software/AI Engineer, Wildlife Protection Solutions

 

Wildlife Protection Solutions (WPS) empowers frontline conservationists with free, cutting-edge technological solutions to protect endangered species and ecosystems. Camera traps are fundamental tools in our mission, capturing vast quantities of image data that provide critical insights into wildlife populations and habitat health. WPS’s global wildlife preserve camera trap network funnels more than 65,000 photos a day into our object detection and species classification pipeline. These real-time insights are often the difference between thwarting a poaching attempt and losing an endangered animal. For our team, managing machine learning (ML) workflows efficiently is critical to advancing conservation and building a thriving planet for all.

Camera trap projects often span multiple locations and years, generating terabytes of visual data accompanied by essential metadata (location, time, and environmental readings). The scale of this data presents significant operational hurdles in terms of data management, processing, and analysis. To overcome these challenges, we are developing Akili, a platform designed to streamline the camera trap research workflow, particularly for non-technical users. A critical component of Akili’s architecture is ClearML, an open-source AI/machine learning platform. This brief describes how Akili leverages two core ClearML features – Tasks and Hyperdatasets – to establish a robust, reproducible, and scalable framework for camera trap data science.

ClearML: The MLOps Backbone for Akili

ClearML excels at automating and orchestrating workflows, offering features like detailed experiment tracking, robust data versioning, model repository management, and distributed task execution.

ClearML Hyperdatasets: Agile and Versioned Data Subsetting

Managing large, dynamic camera trap datasets is a major challenge. Manually curating specific data subsets for different experimental needs (e.g., data from specific parks or containing certain species) is laborious and error-prone. A ClearML Hyperdataset offers a sophisticated solution because it is not a copy of data. Instead, it acts as a versioned definition – essentially a query or a collection of pointers – that specifies a particular subset of the underlying data, which typically resides in cost-effective object storage (e.g. S3, GCS, or Azure Blob Storage).

  1. Ensuring Data Versioning for Reproducibility: When a ClearML Task is configured to use a Hyperdataset, it explicitly records the specific version of that Hyperdataset definition. This establishes an immutable link between the code execution (Task) and the precise data slice it operated on. Even if the underlying data pool grows or the Hyperdataset definition is later updated (creating a new version), past experiments remain fully reproducible because they are tied to the exact Hyperdataset version they originally used. This is critical for debugging models and understanding performance shifts over time.
  2. Storage Efficiency: Rather than copying potentially terabytes of data for each minor experimental variation or train-test-val split, Hyperdatasets allow Akili’s Tasks to dynamically access only the required data through dataset filtering with metadata queries. Since Hyperdatasets primarily store metadata and pointers, they are extremely lightweight, minimizing storage costs and simplifying data management compared to maintaining multiple copies of large datasets.
  3. Annotation Features: ClearML’s intuitive annotation UI and SDK simplifies manual review and correction of automated annotations. ClearML annotations also have a very versatile metadata field, enabling us to track details such as whether a human or specific model made each tag. Reviewer notes can also be stored in annotation metadata, which is particularly useful for documenting challenging annotation decisions.

ClearML Tasks: Capturing and Tracking Every Execution

A Task represents a single, logged execution of a script or piece of code. When code instrumented with the ClearML SDK runs, it automatically logs a wealth of information, creating a comprehensive and traceable record. Within the Akili platform, Tasks are employed to encapsulate distinct stages of the camera trap data workflow:

  1. Data Ingestion & Preprocessing: Scripts responsible for fetching raw data, applying initial quality filters, extracting metadata, standardizing image formats, or performing augmentations are run as ClearML Tasks. ClearML automatically logs the exact code version (Git commit), execution parameters, library dependencies, console outputs, and any resulting artifacts (e.g., processed data files, metadata summaries).
  2. Model Training: Training a model as a Task automates the logging of crucial details. This captures the training script itself, all hyperparameters (like learning rate, batch size, optimizer settings), the specific Hyperdataset version utilized, system metrics (CPU/GPU usage), performance logs (accuracy, loss over epochs), and the final trained model file. This allows Akili users to meticulously compare experiments, pinpoint optimal configurations, and retrieve specific model checkpoints. 
  3. Model Evaluation: When model evaluation is run as a Task, the evaluation script, the model under scrutiny, the test dataset version, and detailed performance metrics (e.g. precision-recall curves, F1 scores) are all logged. 
  4. Inference & Deployment: Applying trained models to incoming camera trap data for automated analysis (e.g., object detection and species classification) can also be run with a Task. The generated predictions are added as annotations on the Hyperdataset frames and analysis results are stored as artifacts linked to the Task.

The primary advantage of using Tasks in Akili is the inherent traceability and reproducibility. Every significant computational step becomes a fully documented experiment. Researchers can effortlessly review past runs, understand the exact conditions of execution, clone Tasks to iterate with modified parameters, or roll back to previous successful runs, ensuring transparency and scientific integrity.

The Synergy: Tasks Utilizing Hyperdatasets

A standard workflow exemplifies the synergy between Hyperdatasets and Tasks:

  1. A researcher defines a Hyperdataset version (e.g., “African_Species_Train_v1.2”) that specifies the criteria for the desired data subset.
  2. A model training script is launched as a ClearML Task, using “African_Species_Train_v1.2”.
  3. During execution, ClearML automatically logs:
    • The Task details (code version, parameters, environment).
    • The exact Hyperdataset version used (v1.2).
    • All associated metrics, logs, artifacts, and the final trained model.

This guarantees that every result generated within the Akili platform has a clear and verifiable origin, linking the specific code execution to the precise data version used, allowing for seamless reproduction.

By strategically integrating ClearML Tasks and Hyperdatasets, the Akili platform provides Wildlife Protections Solutions with an MLOps framework tailored for the unique demands of camera trap research. Hyperdatasets enable flexible, efficient, and rigorously versioned management of dynamic image datasets while Tasks ensure meticulous tracking and reproducibility of all computational processes. Akili, underpinned by ClearML, marks a significant advancement in applying AI technology to the critical mission of protecting global biodiversity.

Editor’s Note: Wildlife Protection Solutions uses ClearML’s AI Development Center, a complete solution for managing the AI lifecycle. Whether customers are building data-centric workflows, training models, or deploying them into production, ClearML provides a unified, open-source platform designed for flexibility, scalability, and efficiency. Streamline every stage of AI development. Fully cloud- and vendor-agnostic, the open source architecture supports seamless integration with existing infrastructure and tools, ensuring that teams can focus on innovation without operational roadblocks. If you’d like to see the power of ClearML in streamlining AI & ML workflows (as well as managing GPU clusters, optimizing utilization, and deploying GenAI models effortlessly), please request a demo.

Facebook
Twitter
LinkedIn
Scroll to Top