ClearmlJob
class automation.ClearmlJob()
Create a new Task based on a base_task_id with a different set of parameters
Parameters
base_task_id (str ) – base task ID to clone from
parameter_override (dict ) – dictionary of parameters and values to set fo the cloned task
task_overrides (dict ) – Task object specific overrides. for example {‘script.version_num’: None, ‘script.branch’: ‘main’}
configuration_overrides (Optional [ Mapping [ str , Union [ str , Mapping ] ] ] ) – Optional, override Task configuration objects. Expected dictionary of configuration object name and configuration object content. Examples:
{‘config_section’: dict(key=’value’)} {‘config_file’: ‘configuration file content’} {‘OmegaConf’: YAML.dumps(full_hydra_dict)}
tags (list ) – additional tags to add to the newly cloned task
parent (str ) – Set newly created Task parent task field, default: base_tak_id.
kwargs (dict ) – additional Task creation parameters
disable_clone_task (bool ) – if False (default), clone base task id. If True, use the base_task_id directly (base-task must be in draft-mode / created),
allow_caching (bool ) – If True, check if we have a previously executed Task with the same specification. If we do, use it and set internal is_cached flag. Default False (always create new Task).
target_project (str ) – Optional, Set the target project name to create the cloned Task in.
abort
abort()
Abort currently running job (can be called multiple times)
Return type
()
delete
delete()
Delete the current temporary job (before launching) Return False if the Job/Task could not deleted
Return type
bool
elapsed
elapsed()
Return the time in seconds since job started. Return -1 if job is still pending
Return type
float
Returns
Seconds from start.
get_console_output
get_console_output(number_of_reports=1)
Return a list of console outputs reported by the Task. Returned console outputs are retrieved from the most updated console outputs.
Parameters
number_of_reports (int ) – number of reports to return, default 1, the last (most updated) console output
Return type
Sequence
[str
]Returns
List of strings each entry corresponds to one report.
get_metric
get_metric(title, series)
Retrieve a specific scalar metric from the running Task.
Parameters
title (str ) – Graph title (metric)
series (str ) – Series on the specific graph (variant)
Return type
(float, float, float)
Returns
A tuple of min value, max value, last value
is_aborted
is_aborted()
Return True, if job is has executed and aborted
Return type
bool
Returns
True the task is currently in aborted state
is_cached_task
is_cached_task()
Return type
bool
Returns
True if the internal Task is a cached one, False otherwise.
is_completed
is_completed()
Return True, if job is has executed and completed successfully
Return type
bool
Returns
True the task is currently in completed or published state
is_failed
is_failed()
Return True, if job is has executed and failed
Return type
bool
Returns
True the task is currently in failed state
is_pending
is_pending()
Return True, if job is waiting for execution
Return type
bool
Returns
True if the task is currently queued.
is_running
is_running()
Return True, if job is currently running (pending is considered False)
Return type
bool
Returns
True, if the task is currently in progress.
is_stopped
is_stopped(aborted_nonresponsive_as_running=False)
Return True, if job finished executing (for any reason)
Parameters
aborted_nonresponsive_as_running (
bool
) – (default: False) If True, ignore the stopped state if the backend non-responsive watchdog sets this Task to stopped. This scenario could happen if an instance running the job is killed without warning (e.g. spot instances)Return type
bool
Returns
True the task is currently one of these states, stopped / completed / failed / published.
iterations
iterations()
Return the last iteration value of the current job. -1 if job has not started yet
Return type
int
Returns
Task last iteration.
launch
launch(queue_name=None)
Send Job for execution on the requested execution queue
Parameters
queue_name (str ) –
Return type
bool
:return False if Task is not in “created” status (i.e. cannot be enqueued) or cannot be enqueued
Return type
bool
Parameters
queue_name (Optional [ str ] ) –
ClearmlJob.register_hashing_callback
classmethod register_hashing_callback(a_function)
Allow to customize the dict used for hashing the Task. Provided function will be called with a dict representing a Task, allowing to return a modified version of the representation dict.
Parameters
a_function (
Callable
[[dict
],dict
]) – Function manipulating the representation dict of a functionReturn type
None
started
started()
Return True, if job already started, or ended. False, if created/pending.
Return type
bool
Returns
False, if the task is currently in draft mode or pending.
status
status(force=False)
Return the Job Task current status. Options are: “created”, “queued”, “in_progress”, “stopped”, “published”, “publishing”, “closed”, “failed”, “completed”, “unknown”.
Parameters
force (
bool
) – Force status update, otherwise, only refresh state every 1 secReturn type
str
Returns
Task status Task.TaskStatusEnum in string.
status_message
status_message()
Gets the status message of the task. Note that the message is updated only after BaseJob.status() is called
Return type
str
Returns
The status message of the corresponding task as a string
task_id
task_id()
Return the Task id.
Return type
str
Returns
The Task ID.
ClearmlJob.update_status_batch
classmethod update_status_batch(jobs)
Update the status of jobs, in batch_size
Parameters
jobs (Sequence [ BaseJob ] ) – The jobs to update the status of
Return type
()
wait
wait(timeout=None, pool_period=30.0, aborted_nonresponsive_as_running=False)
Wait until the task is fully executed (i.e., aborted/completed/failed)
Parameters
timeout (
Optional
[float
]) – maximum time (minutes) to wait for Task to finishpool_period (
float
) – check task status every pool_period secondsaborted_nonresponsive_as_running (
bool
) – (default: False) If True, ignore the stopped state if the backend non-responsive watchdog sets this Task to stopped. This scenario could happen if an instance running the job is killed without warning (e.g. spot instances)
Return type
bool
Returns
True, if Task finished.
worker
worker()
Return the current worker ID executing this Job. If job is pending, returns None
Return type
Optional
[str
]Returns
ID of the worker executing / executed the job, or None if job is still pending.