Skip to main content

CLI

The clearml-serving utility is a CLI tool for model deployment and orchestration.

The following page provides a reference for clearml-serving's CLI commands:

  • list - List running Serving Services
  • create - Create a new Serving Service
  • metrics - Configure inference metrics Service
  • config - Configure a new Serving Service
  • model - Configure model endpoints for a running Service

Global Parameters#

clearml-serving [-h] [--debug] [--id ID] {list,create,metrics,config,model}
NameDescriptionOptional
--idServing Service (Control plane) Task ID to configure (if not provided automatically detect the running control plane Task)No
--debugPrint debug messagesYes
Service ID

The Serving Service's ID (--id) is required to execute the metrics, config, and model commands.

list#

List running Serving Services.

clearml-serving list [-h]

create#

Create a new Serving Service.

clearml-serving create [-h] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT]

Parameters

NameDescriptionOptional
--nameServing service's name. Default: Serving-ServiceNo
--projectServing service's project. Default: DevOpsNo
--tagsServing service's user tags. The serving service can be labeled, which can be useful for organizingYes

metrics#

Configure inference metrics Service.

clearml-serving metrics [-h] {add,remove,list}

add#

Add/modify metric for a specific endpoint.

clearml-serving metrics add [-h] --endpoint ENDPOINT [--log-freq LOG_FREQ]
[--variable-scalar VARIABLE_SCALAR [VARIABLE_SCALAR ...]]
[--variable-enum VARIABLE_ENUM [VARIABLE_ENUM ...]]
[--variable-value VARIABLE_VALUE [VARIABLE_VALUE ...]]

Parameters

NameDescriptionOptional
--endpointMetric endpoint name including version (e.g. "model/1" or a prefix "model/*"). Notice: it will override any previous endpoint logged metricsNo
--log-freqLogging request frequency, between 0.0 to 1.0. Example: 1.0 means all requests are logged, 0.5 means half of the requests are logged if not specified. To use global logging frequency, see config --metric-log-freqYes
--variable-scalarAdd float (scalar) argument to the metric logger, <name>=<histogram>. Example: with specific buckets: "x1=0,0.2,0.4,0.6,0.8,1" or with min/max/num_buckets "x1=0.0/1.0/5"Yes
--variable-enumAdd enum (string) argument to the metric logger, <name>=<optional_values>. Example: "detect=cat,dog,sheep"Yes
--variable-valueAdd non-samples scalar argument to the metric logger, <name>. Example: "latency"Yes

remove#

Remove metric from a specific endpoint.

clearml-serving metrics remove [-h] [--endpoint ENDPOINT]
[--variable VARIABLE [VARIABLE ...]]

Parameters

NameDescriptionOptional
--endpointMetric endpoint name including version (e.g. "model/1" or a prefix "model/*")No
--variableRemove (scalar/enum) argument from the metric logger, <name> example: "x1"Yes

list#

List metrics logged on all endpoints.

clearml-serving metrics list [-h]

config#

Configure a new Serving Service.

clearml-serving config [-h] [--base-serving-url BASE_SERVING_URL]
[--triton-grpc-server TRITON_GRPC_SERVER]
[--kafka-metric-server KAFKA_METRIC_SERVER]
[--metric-log-freq METRIC_LOG_FREQ]

Parameters

NameDescriptionOptional
--base-serving-urlExternal base serving service url. Example: http://127.0.0.1:8080/serveYes
--triton-grpc-serverExternal ClearML-Triton serving container gRPC address. Example: 127.0.0.1:9001Yes
--kafka-metric-serverExternal Kafka service url. Example: 127.0.0.1:9092Yes
--metric-log-freqSet default metric logging frequency between 0.0 to 1.0. 1.0 means that 100% of all requests are loggedYes

model#

Configure model endpoints for an already running Service.

clearml-serving model [-h] {list,remove,upload,canary,auto-update,add}

list#

List current models.

clearml-serving model list [-h]

remove#

Remove model by its endpoint name.

clearml-serving model remove [-h] [--endpoint ENDPOINT]

Parameter

NameDescriptionOptional
--endpointModel endpoint nameNo

upload#

Upload and register model files/folder.

clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project PROJECT
[--framework {scikit-learn,xgboost,lightgbm,tensorflow,pytorch}]
[--publish] [--path PATH] [--url URL]
[--destination DESTINATION]

Parameters

NameDescriptionOptional
--nameSpecifying the model name to be registered inNo
--tagsAdd tags to the newly created modelYes
--projectSpecify the project for the model to be registered inNo
--frameworkSpecify the model framework. Options are: "scikit-learn", "xgboost", "lightgbm", "tensorflow", "pytorch"Yes
--publishPublish the newly created model (change model state to "published" (i.e. locked and ready to deploy)Yes
--pathSpecify a model file/folder to be uploaded and registeredYes
--urlSpecify an already uploaded model url (e.g. s3://bucket/model.bin, gs://bucket/model.bin)Yes
--destinationSpecify the target destination for the model to be uploaded (e.g. s3://bucket/folder/, gs://bucket/folder/)Yes

canary#

Add model Canary/A/B endpoint.

clearml-serving model canary [-h] [--endpoint ENDPOINT] [--weights WEIGHTS [WEIGHTS ...]]
[--input-endpoints INPUT_ENDPOINTS [INPUT_ENDPOINTS ...]]
[--input-endpoint-prefix INPUT_ENDPOINT_PREFIX]

Parameters

NameDescriptionOptional
--endpointModel canary serving endpoint name (e.g. my_model/latest)Yes
--weightsModel canary weights (order matching model ep), (e.g. 0.2 0.8)Yes
--input-endpointsModel endpoint prefixes, can also include version (e.g. my_model, my_model/v1)Yes
--input-endpoint-prefixModel endpoint prefix, lexicographic order or by version <int> (e.g. my_model/1, my_model/v1), where the first weight matches the last version.Yes

auto-update#

Add/Modify model auto-update service.

clearml-serving model auto-update [-h] [--endpoint ENDPOINT] --engine ENGINE
[--max-versions MAX_VERSIONS] [--name NAME]
[--tags TAGS [TAGS ...]] [--project PROJECT]
[--published] [--preprocess PREPROCESS]
[--input-size INPUT_SIZE [INPUT_SIZE ...]]
[--input-type INPUT_TYPE] [--input-name INPUT_NAME]
[--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
[--output_type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
[--aux-config AUX_CONFIG [AUX_CONFIG ...]]

Parameters

NameDescriptionOptional
--endpointBase model endpoint (must be unique)No
--engineModel endpoint serving engine (triton, sklearn, xgboost, lightgbm)No
--max-versionsMax versions to store (and create endpoints) for the model. Highest number is the latest versionYes
--nameSpecify model name to be selected and auto-updated (notice regexp selection use "$name^" for exact match)Yes
--tagsSpecify tags to be selected and auto-updatedYes
--projectSpecify model project to be selected and auto-updatedYes
--publishedOnly select published model for auto-updateYes
--preprocessSpecify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the modelsYes
--input-sizeSpecify the model matrix input size [Rows x Columns X Channels etc ...]Yes
--input-typeSpecify the model matrix input type. Examples: uint8, float32, int16, float16 etc.Yes
--input-nameSpecify the model layer pushing input into. Example: layer_0Yes
--output-sizeSpecify the model matrix output size [Rows x Columns X Channels etc ...]Yes
--output_typeSpecify the model matrix output type. Examples: uint8, float32, int16, float16 etc.Yes
--output-nameSpecify the model layer pulling results from. Examples: layer_99Yes
--aux-configSpecify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")Yes

add#

Add/Update model.

clearml-serving model add [-h] --engine ENGINE --endpoint ENDPOINT [--version VERSION]
[--model-id MODEL_ID] [--preprocess PREPROCESS]
[--input-size INPUT_SIZE [INPUT_SIZE ...]]
[--input-type INPUT_TYPE] [--input-name INPUT_NAME]
[--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
[--output-type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
[--aux-config AUX_CONFIG [AUX_CONFIG ...]] [--name NAME]
[--tags TAGS [TAGS ...]] [--project PROJECT] [--published]

Parameters

NameDescriptionOptional
--engineModel endpoint serving engine (triton, sklearn, xgboost, lightgbm)No
--endpointBase model endpoint (must be unique)No
--versionModel endpoint version (default: None)Yes
model-idSpecify a model ID to be servedNo
--preprocessSpecify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the modelsYes
--input-sizeSpecify the model matrix input size [Rows x Columns X Channels etc ...]Yes
--input-typeSpecify the model matrix input type. Examples: uint8, float32, int16, float16 etc.Yes
--input-nameSpecify the model layer pushing input into. Example: layer_0Yes
--output-sizeSpecify the model matrix output size [Rows x Columns X Channels etc ...]Yes
--output_typeSpecify the model matrix output type. Examples: uint8, float32, int16, float16 etc.Yes
--output-nameSpecify the model layer pulling results from. Examples: layer_99Yes
--aux-configSpecify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")Yes
--nameInstead of specifying model-id select based on model nameYes
--tagsSpecify tags to be selected and auto-updatedYes
--projectInstead of specifying model-id select based on model projectYes
--publishedInstead of specifying model-id select based on model publishedYes