As organizations move from AI experimentation to production deployments, the challenge often shifts from model selection to operationalization. Modern model ecosystems, particularly those developed by NVIDIA, offer powerful capabilities, but running them reliably across infrastructure, scaling them for production workloads, and integrating them into enterprise workflows requires orchestration, infrastructure management, and governance.
ClearML addresses this operational layer by providing an integrated platform for deploying, managing, and scaling AI models across heterogeneous infrastructure. When combined with NVIDIA’s open models, including NVIDIA Cosmos™ family for physical AI and NVIDIA Nemotron for agentic AI, ClearML enables organizations to deploy advanced models as production services while maintaining visibility, governance, and reproducibility across the entire AI pipeline. To learn more about ClearML’s validated NVIDIA Metropolis Blueprint for Video Search and Summarization (VSS) for building vision AI agents, read the blog article.
NVIDIA Cosmos and Nemotron Models in Modern AI Workloads
NVIDIA’s open model families spans multiple domains, including large language models, vision-language models, speech models, and retrieval systems. Within this ecosystem, models such as NVIDIA Cosmos Reason 2 focus on visual reasoning and physical AI workloads, while Nemotron models, built and optimized using NVIDIA NeMo, provide large language model, speech, retrieval-augmented generation (RAG), and safety capabilities designed for enterprise generative AI applications.
Cosmos Reason 2 is a vision-language model designed to reason over visual inputs. It combines a Vision Transformer encoder with a transformer-based language model and supports a large context window capable of analyzing complex scenes, temporal events, and spatial relationships. This enables applications involving industrial vision, robotics, and large-scale video analysis.
Nemotron models, by contrast, focus on language understanding and generation tasks such as conversational interfaces, summarization, reasoning over enterprise knowledge bases, and RAG workflows. In many production AI systems, these two model categories are complementary: Cosmos models process visual inputs and generate structured insights, while Nemotron models interpret those insights and generate language-based responses or summaries.
Both Cosmos and Nemotron models can be deployed using NVIDIA NIM microservices, which provide optimized inference environments and standardized APIs for model access.
While these services simplify model packaging and inference optimization, organizations still require a platform to manage deployment, infrastructure allocation, orchestration, and operational visibility. This is where ClearML plays a central role.
Deploying NVIDIA Cosmos and Nemotron Models with ClearML
ClearML provides infrastructure orchestration and application deployment capabilities that allow NVIDIA models packaged as NIM microservices, including both Cosmos and Nemotron models, to be deployed quickly and managed centrally.
Through ClearML’s application deployment framework, models can be launched from a curated catalog and assigned to available GPU infrastructure. When a model is deployed, ClearML provisions the corresponding containerized inference service, assigns it to a GPU resource pool, configures networking, and exposes the model through a secure endpoint managed by the ClearML application gateway.
This deployment workflow eliminates much of the manual infrastructure work typically associated with running large models. Instead of configuring container orchestration environments, networking rules, or infrastructure automation scripts, deployment can be initiated directly from the ClearML interface.
ClearML also operates across multiple infrastructure environments, including bare-metal servers, virtual machines, and Kubernetes clusters, allowing organizations to run NVIDIA models wherever their compute resources reside.
By coordinating deployments through a single platform, teams can run multimodal AI systems that combine vision-language reasoning from Cosmos models with language reasoning from Nemotron models within the same operational environment.
Flexible GPU Infrastructure Support
Modern AI deployments frequently span a wide range of GPU hardware configurations. Early experimentation may occur on workstation-class GPUs, while production inference workloads often run on large datacenter accelerators.
ClearML is designed to support this diversity of infrastructure. Models can be scheduled onto available GPU resources through ClearML’s resource queues, allowing organizations to run smaller models on GPUs such as NVIDIA RTX-class or even edge devices on a dedicated network while deploying larger models or multiple concurrent services on higher-capacity accelerators such as NVIDIA Blackwell and NVIDIA Hopper GPUs.
This flexibility allows teams to align model deployments with the hardware they already operate. Rather than requiring a single standardized GPU configuration, ClearML enables organizations to run different models across heterogeneous GPU environments while maintaining centralized visibility and management.
Orchestrating Full AI Pipelines
Deploying models is only one part of the production AI lifecycle. Real-world systems typically involve multi-stage pipelines that include ingestion, preprocessing, inference, indexing, and downstream retrieval or reasoning.
ClearML pipelines provide a mechanism for encoding these workflows as versioned directed acyclic graphs (DAGs). Each stage of a pipeline can run on designated compute resources, and ClearML automatically tracks execution history, artifacts, metrics, and intermediate outputs. This orchestration layer allows organizations to operationalize reference architectures such as the NVIDIA VSS Blueprint.
The VSS architecture processes video streams through several AI services, including visual caption generation, audio transcription, embedding generation, and retrieval-augmented question answering. In this workflow, Cosmos Reason 2 can function as the visual reasoning component that analyzes video frames and produces descriptive captions or scene interpretations. These outputs are then indexed in a vector database and queried through natural language interfaces.
ClearML pipelines coordinate the execution of each stage, ensuring that ingestion, inference, indexing, and query processing operate as a unified workflow.

Use Case: Intelligent Video Search and Summarization
One enterprise application enabled by the combination of ClearML and NVIDIA Cosmos models is intelligent video search and summarization.
One enterprise application enabled by the combination of ClearML and NVIDIA Cosmos models is intelligent video search and summarization.
Organizations across sectors such as transportation, industrial operations, and public infrastructure frequently manage large archives of video data or millions of live video streams. Traditional search approaches rely on manual tagging or limited metadata, which restricts the ability to retrieve meaningful insights from complex visual scenes.
By deploying Cosmos Reason2-8B through ClearML, video streams can be automatically analyzed to produce structured descriptions of events and activities. These descriptions can then be embedded and indexed in a vector database.
Users can interact with the system using natural language queries such as:
- “Show all events where vehicles entered a restricted area.”
- “Find scenes where equipment remained idle for more than ten minutes.”
- “Summarize unusual activity during the night shift.”
ClearML pipelines coordinate the ingestion, caption generation, embedding creation, and indexing stages required to enable these queries, allowing organizations to turn large video archives into searchable knowledge systems.

Use Case: Multimodal Enterprise Knowledge Systems
A second enterprise use case involves multimodal knowledge systems that combine visual understanding with language-based reasoning.
In environments such as logistics operations, manufacturing facilities, or large enterprise campuses, organizations generate large volumes of visual data from cameras and operational monitoring systems. Cosmos models can analyze these inputs and produce structured representations of events or scenes.
These outputs can then be integrated into a retrieval pipeline that feeds a Nemotron language model. Employees or analysts can query the system using natural language prompts, asking questions such as:
- “What safety incidents occurred in the warehouse yesterday?”
- “Show examples where forklifts crossed pedestrian paths.”
- “Summarize activity across loading zones during peak hours.”
In this architecture, Cosmos models provide visual reasoning capabilities while Nemotron models generate conversational responses or summaries based on indexed information. ClearML pipelines orchestrate the entire process, enabling multimodal AI systems that combine perception, retrieval, and language reasoning.
Observability and Governance
Production AI deployments require visibility into both infrastructure usage and model behavior. ClearML provides built-in monitoring capabilities that allow teams to track GPU utilization, inference latency, and request throughput directly within the platform interface.
Security and governance features are integrated into the deployment environment as well. Access to deployed model endpoints can be authenticated and controlled through role-based access policies, and administrative logs track deployment activity, endpoint usage, and configuration changes.
These capabilities enable organizations to operate AI systems within environments that require operational accountability, auditability, and controlled access to resources.
Closing: From Models to Production Systems
NVIDIA’s model ecosystem (Cosmos vision-language models and Nemotron large language models) provides powerful building blocks for modern AI systems that combine perception, reasoning, and natural language interaction.
ClearML complements these capabilities by providing the infrastructure coordination layer required to deploy, orchestrate, and operate these models across heterogeneous environments. By integrating model deployment, pipeline orchestration, infrastructure scheduling, and observability within a unified platform, ClearML enables organizations to move from isolated AI experiments to production systems capable of supporting complex, multimodal workloads at scale. To learn more, book some time with a ClearML solution engineer.