DatasetVersion
class datasetversion.DatasetVersion()
DatasetVersion
represents a specific version in a dataset.
Do not instantiate directly. Use DatasetVersion.get_version method instead.
BulkContext
class BulkContext
A context manager to modify frames (i.e. add/update/remove) in bulk.
Use DatasetVersion.get_bulk_context
to obtain.
The bulk context allows modifying the version by adding/updating/deleting
of frames one at a time, but the actual update request will happen in bulk.
The update request (flush) will happen every
flush_threshold
updates, or upon __exit__
.
Create Bulk context for automatically flushing frames
Parameters
dv (
DatasetVersion
) – DatasetVersion object to useflush_threshold (
Optional
[int
]) – If provided flush every X frameslog (
Optional
[Logger
]) – Optional, provide external loggerrefresh_version_stats (
Optional
[bool
]) – automatically refresh version statistics (default: True)auto_upload_destination (
Optional
[str
]) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.local_dataset_root_path (
Union
[str
,Path
,None
]) – Required ifauto_upload_destination
is provided. It should point to the common folder for all local source filesallow_update (
bool
) – If False (default), all frame operations will use the “add” action and “update” will not be used (i.e. even frames collected using the BulkContext.update() call will be added, not updated). This is an advanced setting, please change only if you understand the limitations of using update. Note that when using update, provided frame data is merged with the existing indexed frame data - this means frame fields cannot be removed when using the update operation.
add_frame
BulkContext.add_frame(frame, warn_on_duplicate_frames=False)
NOTICE! If frames already contain frame.id field, they will update (overwrite) existing frames. If not provided, frame.id is generated based on the source URI. If a local file should be uploaded but has already been previously uploaded, the existing URI for that file will be reused, otherwise the file will be uploaded.
Only available if version is still in draft (writable) mode
Parameters
frame (DatasetVersion.Frame ) – The frame to add to the version.
warn_on_duplicate_frames (
Optional
[bool
]) – If True, issue a warning when adding a frame with an ID that was previously added to this instance (default False)
Return type
None
delete_frame
BulkContext.delete_frame(frame, delete_sources=False)
Delete a frame from the current DatasetVersion
.
The frame may be represented by an ID string, or a
DatasetVersion.Frame
object. Frames are deleted by their IDs,
all other frame attributes (if exists) are ignored.
Only available if version is still in draft (writable) mode.
Parameters
frame (
Union
[FrameGroup
,SingleFrame
,str
,ForwardRef
]) – The frame to delete (frame object or ID string)delete_sources (
bool
) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort.
Return type
None
flush
BulkContext.flush()
Send any outstanding version changes.
Any updates made using this BulkContext
are
sent to the server.
Return type
None
update_frame
BulkContext.update_frame(frame)
Update an existing frame in the current DatasetVersion
.
Find the frame by its ID, and change its properties to match that
of the frame object passed in frame. Frames exist in a version if they
were previously added (e.g. by
update_frame
), or if they exist
in a parent version. If the frame object does not have an ID,
create a new frame.
Only available if version is still in draft (writable) mode.
Parameters
frame (DatasetVersion.Frame ) – The frame to update.
Return type
None
version_id
property version_id
Version ID string of this specific dataset/version
Return type
str
version_name
property version_name
Dataset version name, not necessarily unique
Return type
str
dataset_id
property dataset_id
Dataset ID string of this specific dataset
Return type
str
dataset_name
property dataset_name
Dataset name, must be a unique name
Return type
str
draft
property draft
Draft flag of the dataset/version, i.e. is this version still writable or is it locked and cannot be changed.
Return type
bool
last_updated
property last_updated
Return the timestamp of the last updated frame in the dataset version
Return type
datetime
comment
property comment
Return the string comment of the specific Dataset Version
Return type
str
DatasetVersion.create_new_dataset
classmethod create_new_dataset(dataset_name=None, description=None, tags=None, raise_if_exists=False, dataset_project=None)
Create a new dataset in the system and return a Dataset
object for it.
Parameters
dataset_name (str ) – The name of the new dataset.
description (str ) – A free text to describe the dataset.
tags (list ) – A list of tags (short strings) to classify the dataset.
raise_if_exists (bool ) – If False (the default) and there is a dataset with the name
dataset_name
, return the existingDataset
. If True and there is a dataset with the namedataset_name
, raiseValueError
exception.dataset_project (str ) – A project name for the newly created dataset.
Return type
Returns
A new
Dataset
object for the newly created dataset.
DatasetVersion.get_current
classmethod get_current(dataset_id=None, dataset_name=None, auto_upload_destination=None, local_dataset_root_path=None, dataset_project=None)
Return a DatasetVersion
object for the current write-enabled version of the dataset
Parameters
dataset_id (str ) – The ID of the dataset of the version to retrieve.
dataset_name (str ) – The name of the dataset of the version to retrieve.
auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
Path ] local_dataset_root_path (Union [ str , ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source filesdataset_project (
Optional
[str
]) – The project of the dataset to retrieve.local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –
Return type
ForwardRef
Returns
DatasetVersion
object representing the selected version.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.remove_version
classmethod remove_version(dataset_id=None, dataset_name=None, version_id=None, version_name=None, force=False, dataset_project=None)
Remove a dataset’s version from the system.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Parameters
dataset_id (str ) – The ID of the dataset to be removed.
dataset_name (str ) – The name of the dataset to be removed.
version_id (str ) – The ID of the version to be removed.
version_name (str ) – The name of the version to be removed.
force (bool ) – If
True
, delete even if version is published. Default:False
dataset_project (str ) – The project of the dataset to be removed.
Return type
None
DatasetVersion.get_version
classmethod get_version(dataset_id=None, dataset_name=None, version_id=None, version_name=None, auto_upload_destination=None, local_dataset_root_path=None, raise_on_multiple=False, dataset_project=None)
Return a DatasetVersion
object for a specific version
If no version name/id is provided, the current version of the dataset is returned.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Parameters
dataset_id (str ) – The ID of the dataset of the version to retrieve.
dataset_name (str ) – The name of the dataset of the version to retrieve.
version_id (str ) – [optional] The ID of the version to retrieve.
version_name (str ) – [optional] The name of the version to retrieve.
auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
Path ] local_dataset_root_path (Union [ str , ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source filesraise_on_multiple (bool ) – Raise error if multiple versions are found
dataset_project (str ) – The project of the dataset of the version to retrieve.
local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –
Return type
ForwardRef
Returns
DatasetVersion
object representing the selected version.
DatasetVersion.get_single_frame
classmethod get_single_frame(frame_id, dataset_id=None, dataset_name=None, version_id=None, version_name=None, dataset_project=None)
Return a SingleFrame
/ FrameGroup
object with the requested frame_id (UUID) from a specific dataset version
Parameters
frame_id (str ) – The UUID of the requested frame id
dataset_id (str ) – The ID of the dataset of the version to retrieve.
dataset_name (str ) – The name of the dataset of the version to retrieve.
version_id (str ) – The ID of the version to retrieve.
version_name (str ) – The name of the version to retrieve.
dataset_project (str ) – The project of the dataset of the version to retrieve.
Return type
Union
[FrameGroup
,SingleFrame
]Returns
SingleFrame / FrameGroup object representing the requested frame
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.get_frames_by_source
classmethod get_frames_by_source(source_uri, dataset_id=None, dataset_name=None, version_id=None, version_name=None, dataset_project=None)
Return a list of SingleFrame
/ FrameGroup
objects with the requested source_uri pattern from a specific dataset version
Parameters
source_uri (str ) – Source uri match pattern. Examples: ‘/home/folder/’ or ‘/folder/’ or ‘https://domain.com/folder/’ or ‘s3://bucket/folder/*’ etc.
dataset_id (str ) – The ID of the dataset of the version to retrieve.
dataset_name (str ) – The name of the dataset of the version to retrieve.
version_id (str ) – The ID of the version to retrieve.
version_name (str ) – The name of the version to retrieve.
dataset_project (str ) – The project of the dataset of the version to retrieve.
Return type
List
[Union
[SingleFrame
,FrameGroup
]]Returns
list of SingleFrame / FrameGroup object representing the requested frame
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.get_frames_by_ids
classmethod get_frames_by_ids(frame_ids, projection=None, dataset_id=None, dataset_name=None, version_id=None, version_name=None, dataset_project=None)
Return a list of SingleFrame
/ FrameGroup
objects with the requested frame IDs from a specific dataset version
Calling DatasetVersion.get_frames_by_ids is deprecated starting version 3.8, and will be removed in favor of using the instance method call dataset_version.get_frames_by_ids by 4Q 2023
Parameters
frame_ids (
Collection
[str
]) – A collection of frame ID strings.projection (
Optional
[Collection
[str
]]) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. To see supported fields for projection, see the schema at backend_api.services.frames.Frame. If this argument is set, the values the iterator returns are dictionaries representing each frameFor example:
dataview.get_iterator(projection=['id', 'dataset.id', 'sources'])
# will return an iterator that yields dictionaries with the following fields:
# {
# 'id': '514504adbb6a91620eefa3e21ecfcc31',
# 'dataset': {
# 'id': 'df3638ec95454589bf86ba97f344f697'
# },
# 'sources': [
# {
# 'id': 'Frame',
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0,
# 'preview': {
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0
# }
# }
# ]
# }dataset_id (str ) – The ID of the dataset of the version to retrieve.
dataset_name (str ) – The name of the dataset of the version to retrieve.
version_id (str ) – The ID of the version to retrieve.
version_name (str ) – The name of the version to retrieve.
dataset_project (str ) – The project of the dataset of the version to retrieve.
Return type
Union[List[Union[allegroai.dataframe.singleframe.SingleFrame, allegroai.dataframe.framegroup.FrameGroup]], List[dict]]
When calling this method as an instance method, dataset_id, dataset_name, dataset_project, version_id, and version_name are not required.
Return type
Union
[List
[Union
[SingleFrame
,FrameGroup
]],List
[dict
]]Returns
A list of SingleFrame / FrameGroup objects or a list of dicts representing the requested frames.
Parameters
frame_ids (Collection [ str ] ) –
projection (Optional [ Collection [ str ] ] ) –
dataset_id (Optional [ str ] ) –
dataset_name (Optional [ str ] ) –
version_id (Optional [ str ] ) –
version_name (Optional [ str ] ) –
dataset_project (Optional [ str ] ) –
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.create_snapshot
classmethod create_snapshot(version_name=None, version_id=None, dataset_name=None, dataset_id=None, publish_name=None, publish_comment=None, publish_metadata=None, child_name=None, child_comment=None, child_metadata=None, dataset_project=None)
Publishes the specified version and creates a draft child version
Parameters
version_name (str ) – The name of the draft version for the snapshot.
version_id (str ) – The ID of the draft version for the snapshot.
dataset_name (str ) – The name of the dataset.
dataset_id (str ) – The ID of the dataset to create the version in.
publish_name (str ) – New name for the published version. The default value is ‘snapshot <date-time>’.
publish_comment (str ) – New comment for the published version. The default value is ‘published at <date-time> by <user>’.
publish_metadata (dict ) – User-specified metadata object for the published version. Keys can not include ‘$’ and ‘.’.
child_name (str ) – Name for the child version. If not provided then the name of the parent version is taken.
child_comment (str ) – Comment for the child version.
child_metadata (dict ) – User-specified metadata object for the child version. Keys must not include ‘$’ and ‘.’.
dataset_project (str ) – The project of the dataset
Return type
ForwardRef
Returns
DatasetVersion
object representing the new draft child version.
If no version_name/id is provided, the current version of the dataset is the snapshot version.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.create_version
classmethod create_version(version_name, description=None, dataset_id=None, dataset_name=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None, dataset_project=None)
Create a new version in a dataset with a specific name.
If a version by that name already exists and in draft mode (i.e. writable), return that one,
unless raise_if_exists
is True
, than raise ValueError
Parameters
version_name (str ) – The name of the new version.
description (str ) – Description of the new dataset version
dataset_id (str ) – The ID of the dataset to create the version in.
dataset_name (str ) – The name of the dataset to create the version in.
parent_version_ids (list ) – A list of the new version parents IDs. All IDs must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
parent_version_names (list ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
raise_if_exists (bool ) – If
True
and a version by namename
already exists, raiseValueError
. IfFalse
and a version by that name already exists, return it.auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
Path ] local_dataset_root_path (Union [ str , ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source filesdataset_project (str ) – The project of dataset to create the version in.
local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –
Return type
ForwardRef
Returns
New
DatasetVersion
object representing the new version.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
DatasetVersion.get_versions
classmethod get_versions(dataset_name=None, dataset_id=None, only_published=False, only_draft=False, dataset_project=None)
Return a list of all versions in a dataset.
Parameters
dataset_name (str ) – The name of the dataset. If several datasets with this name exist, select an arbitrary one.
dataset_id (str ) – The ID of the dataset to list.
only_published (bool ) – If
True
, return only published versions. IfFalse
, return all versions.only_draft (bool ) – If
True
, return only draft (write enabled) versions. IfFalse
, return all versions.dataset_project (
Optional
[str
]) – The project of the dataset to list
Return type
List
[ForwardRef
]Returns
A list of
DatasetVersion
, one for each version of the dataset. Versions are sorted by update time, from latest updated ([0]) to oldest
DatasetVersion.get_datasets
classmethod get_datasets()
Return a list of all the dataset in the system, sorted by created time.
Return type
List
[None
]Returns
A list of
datasets.Dataset
, one for each dataset. Datasets are sorted by created time, from the oldest to the newest
get_iterator
get_iterator(projection=None)
Get an iterator for this version.
Parameters
projection (Optional [ Sequence [ str ] ] ) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. If this argument is set, the values the iterator returns are dictionaries representing each frame
For example:
version.get_iterator(projection=['id', 'dataset.id', 'sources'])
# will return an iterator that yields dictionaries with the following fields:
# {
# 'id': '514504adbb6a91620eefa3e21ecfcc31',
# 'dataset': {
# 'id': 'df3638ec95454589bf86ba97f344f697'
# },
# 'sources': [
# {
# 'id': 'Frame',
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0,
# 'preview': {
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0
# }
# }
# ]
# }Return type
Generator[Union[“DatasetVersion.Frame”, dict]]
Returns
An iterator on all the version’s frames.
add_frames
add_frames(frames, warn_on_duplicate_frames=False, batch_size=1000, refresh_version_stats=True, auto_upload_destination=None, local_dataset_root_path=None, force_upload=False, progress_report=1, register_on_upload_failure=False, upload_retries=5, src_to_dst_mapping=None)
Add frames to this DatasetVersion
.
NOTICE! If frames already contain frame.id field, they will update (overwrite) existing frames.
If not provided, frame.id is generated based on the source URI.
If a local file should be uploaded but has already been previously uploaded, the existing URI
for that file will be reused, otherwise the file will be uploaded.
Only available if version is still in draft (writable) mode
Parameters
frames (list ) – A list of new frames to save.
warn_on_duplicate_frames (bool ) – If True, issue a warning when adding a frame with an ID that was previously added to this instance (default False)
batch_size (
int
) – Number of frames in a single add request (default: 1000), batch_size affects the speed of the upload, versus reliability. It does not limit the number of frames per call and in most cases there is no need to change it.refresh_version_stats (
Optional
[bool
]) – Automatically callcommit_version
after adding frames to refresh this version’s statistics.auto_upload_destination (
Optional
[str
]) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage. Examples: ‘s3://bucket/datasets/’, ‘gs://bucket/dataset’, ‘azure://bucket/dataset’, ‘http://clearml-server/bucket/dataset’ Notes:
The uploaded files will keep the same structure inside the designation storage under dataset_id/version_name.version_id/ folders
1. If a file content hash is already registered, it will automatically link to
the existing remote file instead of re-uploading the local copy
2. Inside the dataset/version folder the files are stored in the same path as on the local storage,
relatively the provided local_root_dataset_folder
local_dataset_root_path (
Union
[str
,Path
,None
]) – Required ifauto_upload_destination
is provided. Mutually_exclusive with src_to_dst_mappinglocal_dataset_root_path – Required if auto_upload_destination is provided. It should point to the common folder for all local source files, This root folder is used to detect the relative path of a single source file, to be uploaded to the remote storage. Example:
'auto_upload_destination='s3://bucket/datasets/', local_dataset_root_path='/home/user/data/'
will make sure a file ‘/home/user/data/images/01/1.jpg’ will be uploaded to:’s3://bucket/datasets/dataset_id/version_id/images/01/1.jpg’
force_upload (
Optional
[bool
]) – If True andauto_upload_destination
is provided, will force to upload the framesprogress_report (
Optional
[int
]) – Report frame uploaded everyprogress_report
frames uploaded/registered, atbatch_size
granularity. (default: report every batch)register_on_upload_failure – If True, register the frames even when they fail uploading
upload_retries (
int
) – The number of times the upload of a frame should be retried in case of failure, before marking the frame as failed on upload and continuing to upload the other framessrc_to_dst_mapping – A dictionary mapping the source of the frames to the upload destination. Each source found in the dictionary will be uploaded to the corresponding destination. Mutually_exclusive with auto_upload_destination
Return type
List
[Dict
]Returns
A list containing the frames that failed to upload or register. Each entry in the list is a dictionary with the following key-value pairs:
- ‘frame’ - the frame that failed to be added
- ‘error’ - a string that describes the error
- ‘error_type’ - can be ‘upload’, ‘validation’ or ‘register’. Indicates where the error occurred
update_frames
update_frames(frames, batch_size=1000, refresh_version_stats=True, without_fields=None)
Update existing frames in this DatasetVersion
.
Find each frame by its ID, and change its properties to match that
of the frame object passed in frames. Frames exist in a version if they
were previously added (e.g. by update_frames
), or
if they exist in a parent version.
If the frame object does not have an ID, create a new frame.
Only available if version is still in draft (writable) mode
Parameters
frames (list ) – A list of frames to update.
batch_size (int ) – Number of frames in a single update request (default: 1000) batch_size affects the speed of the upload, versus reliability. It does not limit the number of frames per call and in most cases there is no need to change it.
refresh_version_stats (
Optional
[bool
]) – Automatically callcommit_version
after updating frames to refresh this version’s statistics.without_fields (
Optional
[List
[str
]]) – A list of fields to filter out of the frame object, when sending the update call. These fields correspond to the fields in allegroai.backend_api.services.datasets.Frame. When this list is provided, the call will generate an update operation, otherwise an add operation will be used (seeadd_frames
). Use a non-None value (such as [] or False) in this parameter to specify an update operation without providing any fields.infowhen using an update operation, removed frame fields are ignored (e.g. update cannot be used to remove a field from the meta structure).
For example, to avoid sending the metadata:
dataset_version.update_frames(frames, without_fields=["meta"])
Return type
None
delete_frames
delete_frames(frames, batch_size=1000, refresh_version_stats=True, delete_sources=False)
Delete frames from this DatasetVersion
.
Frames may be represented by an ID string, or a
DatasetVersion.Frame
object. Frames are deleted by their IDs,
all other frame attributes (if exists) are ignored.
Only available if version is still in draft (writable) mode.
Parameters
frames (
Sequence
[Union
[FrameGroup
,SingleFrame
,dict
,ForwardRef
]]) – A list of a frame objects, or frame IDs (string).batch_size (int ) – Number of frame ids in a single delete request (default: 1000) batch_size affects the speed of the upload, versus reliability. It does not limit the number of frames per call and in most cases there is no need to change it.
refresh_version_stats (
Optional
[bool
]) – Automatically callcommit_version
after deleting frames to refresh this version’s statistics.delete_sources (
bool
) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort.
Return type
None
get_bulk_context
get_bulk_context(flush_threshold=None, log=None, refresh_version_stats=True, auto_upload_destination=None, local_dataset_root_path=None, allow_update=False)
Get a context manager for bulk updates to this version.
The bulk context allows add/edit/remove data frames on this version in bulks instead of one by one.
There can only be one BulkContext per DatasetVersion. A second call to get_bulk_context will return the same object.
only available if version is still in draft (writable) mode.
Parameters
flush_threshold (int ) – Commit the updates to the frames every
flush_threshold
updates. An update is a call to one ofBulkContext.add_frame
,BulkContext.update_frame
, orBulkContext.delete_frame
.log (
Optional
[Logger
]) – Logger object for the context to log to. Defaults to the datasetversion module logger.refresh_version_stats (
Optional
[bool
]) – Automatically callcommit_version
after deleting frames to refresh this version’s statistics.auto_upload_destination (
Optional
[str
]) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage. Examples: ‘s3://bucket/datasets/’, ‘gs://bucket/dataset’, ‘azure://bucket/dataset’, ‘http://clearml-server/bucket/dataset’Notes:
1. The uploaded files will keep the same structure inside the designation storage under
dataset_id/version_name.version_id/ folders
2. If a file content hash is already registered, it will automatically link to
the existing remote file instead of re-uploading the local copy
3. Inside the dataset/version folder the files are stored in the same path as on the local storage,
relative to the provided local_root_dataset_folderlocal_dataset_root_path (
Union
[str
,Path
,None
]) – Required ifauto_upload_destination
is provided. It should point to the common folder for all local source files, This root folder is used to detect the relative path of a single source file, to be uploaded to the remote storage. Example:'auto_upload_destination='s3://bucket/datasets/', local_dataset_root_path='/home/user/data/'
will make sure a file ‘/home/user/data/images/01/1.jpg’ will be uploaded to: ‘s3://bucket/datasets/dataset_id/version_id/images/01/1.jpg’allow_update (
bool
) – If False (default), all frame operations will use the “add” action and “update” will not be used (i.e. even frames collected using the BulkContext.update() call will be added, not updated). This is an advanced setting, please change only if you understand the limitations of using update. Note that when using update, provided frame data is merged with the existing indexed frame data - this means frame fields cannot be removed when using the update operation.
Return type
ForwardRef
Returns
A bulk update context manager for this
DatasetVersion
flush
flush(refresh_version_stats=True)
Send any outstanding version changes.
If a BulkContext
was obtained by
get_bulk_context
, any updates made using it are
sent to the server. If not, this is a no-op.
Parameters
refresh_version_stats (
Optional
[bool
]) – Automatically callcommit_version
to refresh this version’s statistics.Return type
None
commit_version
commit_version(kwargs)**
Commit this draft DatasetVersion
, with all the changes made so far.
Committing a version merges changes done to it with the parent version. Further changes to the version are still possible. This is a must step before publishing the version.
This is a blocking method and may take time to finish.
Return type
CallResult
Parameters
kwargs (Any ) –
publish_version
publish_version()
Publish this DatasetVersion
.
After publishing a version it is no longer a draft version and no further changes are allowed for this version.
Return type
bool
Returns
True
if successful,False
otherwise.
get_stats
get_stats()
Returns this version’s statistics
Return type
None
get_parent
get_parent()
Returns the ID of this version’s parent version
Return type
str
get_metadata
get_metadata()
Return type
dict
Returns
return metadata (dict) of user defined values stored for the specific Dataset Version
set_metadata
set_metadata(metadata)
Store metadata (dict) of user defined values stored for the specific Dataset Version
Parameters
metadata (dict ) – key/value dictionary (with support for nested dictionaries)
Return type
bool
Returns
True if successful (locked/published versions cannot change version metadata)
set_masks_labels
set_masks_labels(mask_value_label_mapping)
Store a global (dataset version wide) lookup for per pixel mask values to labels. For example:
{
(0,0,0): ["background"],
(1,1,1): ["person", "sitting"],
(2,2,2): ["cat"],
}
Pixel masks label lookup is stored as a property on the dataset version metadata. Specifically: dataset.get_metadata()[‘mask_labels’] = {…}
Parameters
mask_value_label_mapping (dict ) – key/value dictionary. Key is a tuple of integers, and value is a list/tuple of strings
Return type
bool
Returns
True if successful (locked/published versions cannot change version metadata)
get_masks_labels
get_masks_labels()
Get the global (dataset version wide) lookup for per pixel mask values to labels. For example:
{
(0,0,0): ["background"],
(1,1,1): ["person", "sitting"],
(2,2,2): ["cat"],
}
Pixel masks label lookup is stored as a property on the dataset version metadata. Specifically: dataset.get_metadata()[‘mask_labels’] = {…}
Return type
Dict
[tuple
,tuple
]Returns
key/value dictionary. key is a tuple of integers, and value is a list/tuple of strings