By the Nucleai Team
At Nucleai, our mission is to deliver automated, high-throughput spatial profiling across different modalities (e.g., H&E, IHC, mIF, spatial transcriptomics) with AI-powered deep learning models that support drug discovery, biomarker discovery, and translational research. From tissue segmentation to neighborhood scoring and TILproximity mapping, R&D scientists and technologists collaborate over dozens of concurrent PyTorch-based pipelines.
To manage this complexity, we’ve standardized on ClearML as our end-to-end AI/ML orchestration and observability backbone. ClearML empowers us to run reproducible experiments, serve models reliably, debug with deep visibility, and scale EC2 compute intelligently, all with critical resource tracking and PyTorch native logging.
1. Make the Parent Task the Single Source of Truth
At Nucleai, each major pipeline – ingesting patch data, training, validation, inference, and aggregation – starts with a single orchestrator task defined via PipelineController or @PipelineDecorator. In the ClearML UI, this task is visually represented as a directed-acyclic graph (DAG) that shows step dependencies and status from enqueue through aggregation, and clicking any node reveals related scalars, logs, and artifacts.
Using this architecture means every tensor queue step, training task, and aggregation step is logged under one roof, making lineage traceable and orchestration transparent.
2. Capture Scalability Metrics as Core Scalars
Additionally to the standard summary-writer that is supported out of the box, ClearML’s built-in logging (via report_scalar()) allows us to track batch throughput from DataLoader alongside model outputs. We define scalars such as: “queue_in” for pre-forward batch size and “queue_out” for post-forward size. This creates a clear, numerical view of tensor flow per iteration.
In the background, the ClearML Agent automatically collects GPU utilization and CPU load metrics, so users can overlay resource usage with queue activity and loss curves in the SCALARS tab.
The overlay of queue_in, queue_out and GPU utilization lets us identify bottlenecks (e.g., training starvation, queue blocking, I/O latency, or GPU under-utilization), and quickly tune DataLoader prefetching, batch sizing, and patch ordering.

3. Compare Runs via MultiRun Dashboard
ClearML’s Compare View supports overlaying metrics from multiple pipeline runs. Whether successive hyperparameter scans or validation passes with logic to align queue scalars, resource usage, and loss on the same axis. Debug media (segmentation overlays, patch-level density tables, per-patch embeddings) are aligned by iteration and hyper-block to simplify visual forensic analysis across architectures or augmentations.
Default compare layouts such as “GPU + queue throughput + loss,” can be saved and shared across team members to maintain consistent dashboards.
4. Use Autoscaler-Managed EC2 Agents for Cost-Efficient Scaling
ClearML’s AWS Autoscaler application monitors our task queues (e.g., train-gpu and infer-spot) and spins up EC2 instances within predefined budgets whenever the queue load exceeds idle agents. When instances remain idle beyond the configured max_idle_time, the autoscaler shuts them down automatically eliminating “zombie” workers and unused capacity.
We ensure that agents only join task queues after warm-up steps, including loading reference datasets, pulling Docker images, and loading local caches. This guarantees that when a training task becomes available, the EC2 instance is fully prepared and resource-ready, minimizing startup jitter.

5. Why This Combination Works So Well for Nucleai
We run thousands of spatially-resolved model runs across hundreds of GPU-backed EC2 instances every day. This architecture gives us full observability:
- End-to-end visibility from patch-level queue behavior to final aggregated outputs.
- Fast root-cause debugging: reduce a queue bottleneck causing a spike in segmentation jitter.
- Cost-aware scalability: autoscaler avoids idle EC2 spend; warmstart avoids idle compute startup.
- Complete reproducibility: every task is date-, code-, hyperparameter-, and scalar-logged in a consistent framework.
Setting up this system once allows our spatial biology scientists and ML engineers to collaborate fluidly. They can quickly interpret how queue logic, batch sizing, and LR scheduling affect GPU load and segmentation metrics, allowing them to iterate with confidence using cloned pipelines and compare views.
The result is a highly efficient, resource-aware ML platform where metrics are actionable, lineage is preserved, and EC2 costs stay controlled. May your queues stay full, your GPUs stay busy, and your experiments stay transparent!
Editor’s note: If you’d like to see how you can orchestrate and scale your AI/ML workflows and workloads with ClearML, please request a demo.