Dataset
class datasetversion.Dataset()
A dataset representation.
Used to manage a dataset and its versions
Do not instantiate directly. Use Dataset.get or Dataset.create methods instead.
id
property id
The dataset’s id.
Return type
str
name
property name
The dataset’s name.
Return type
str
Dataset.create
classmethod create(dataset_name, comment=None, tags=None, raise_if_exists=False, dataset_project=None)
Create a new dataset in the system and return a Dataset
object for it.
Parameters
dataset_name (str ) – The name of the new dataset.
comment (str ) – A free text to describe the dataset
tags (list ) – A list of tags (short strings) to classify the dataset. If the dataset already exists, these tags will be added to its list of tags.
raise_if_exists (bool ) – If False (the default) and there is a dataset with the name
dataset_name
, return the existingDataset
. If True and there is a dataset with the namedataset_name
, raiseValueError
exception.dataset_project (str ) – A project name for the newly created dataset.
Return type
ForwardRef
Returns
A new
Dataset
object for the newly created dataset.
Dataset.get
classmethod get(dataset_id=None, dataset_name=None, dataset_project=None)
Return a Dataset
object for an existing dataset.
Parameters
dataset_id (
Optional
[str
]) – The ID of the datasetdataset_name (
Optional
[str
]) – The name of the dataset.dataset_project (
Optional
[str
]) – The project of the dataset.infodataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Return type
ForwardRef
Returns
A new
Dataset
object for the dataset. Ifdataset_name
is set and there are several datasets with that name, return an arbitrary one.
Dataset.delete
classmethod delete(dataset_id=None, dataset_name=None, delete_all_versions=False, force=False, delete_sources=False, show_progress=True, dataset_project=None)
Delete a dataset from the system
If several datasets with the name dataset_name exist, delete an arbitrary one.
Notice that delete_sources
has no effect in this case.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Parameters
dataset_id (str ) – The ID of the dataset.
dataset_name (str ) – The name of the dataset.
delete_all_versions (bool ) – If
True
, delete the dataset with all of its versions. IfFalse
, expect the dataset to have no versions. If there are, raise an exception. Default:False
.force (bool ) – If
True
, anddelete_all_versions
isTrue
, delete also published versions. IfFalse
, anddelete_all_versions
isTrue
, raise an exception if there is a published version in the dataset. Ifdelete_all_versions
isFalse
, this has no effect. Default:False
delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. This parameter is ignored if
delete_all_versions
is False.show_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if
delete_sources
is False. Note that tqdm needs to be installed for this to work.dataset_project (str ) – The project name of the dataset.
Return type
None
create_version
create_version(version_name, description=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None)
Create and return a new DatasetVersion
for this Dataset.
parent_version_ids and parent_version_names are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Parameters
version_name (str ) – The new version name.
description (str ) – A free text to describe the version.
parent_version_ids (list ) – A list of the new version parents IDs. All IDs must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
parent_version_names (list ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
raise_if_exists (bool ) – If
False
(the default) and a version with the nameversion_name
exists in this dataset, return that version. IfTrue
, raise aValueError
exception.auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
Path ] local_dataset_root_path (Union [ str , ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source fileslocal_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –
Return type
ForwardRef
Returns
A new
DatasetVersion
object with the nameversion_name
in this Dataset.
get_version
get_version(version_id=None, version_name=None, auto_upload_destination=None, local_dataset_root_path=None, raise_on_multiple=False)
Return a DatasetVersion
object of a version in this dataset.
version_id and version_name are mutually exclusive. setting both to non-None values will raise a UsageError exception.
Parameters
version_id (str ) – The ID of the version to get.
version_name (str ) – The name of the version to get. If several versions exist with that name, return an arbitrary one.
auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
Path ] local_dataset_root_path (Union [ str , ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source filesraise_on_multiple (bool ) – Raise error if multiple versions are found
local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –
Return type
Returns
A
DatasetVersion
object of the desired version from this dataset.
get_versions
get_versions(only_published=False)
Return a list of all the versions of a Dataset
Parameters
only_published (bool ) – If
True
, return only published versions. IfFalse
, return all versions.Return type
List
[DatasetVersion
]Returns
A list of
DatasetVersion
objects for all the versions in this dataset.
delete_version
delete_version(version_id=None, version_name=None, force=False, delete_sources=False, show_progress=True)
Delete a version from this dataset.
version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
Parameters
version_id (str ) – The ID of the version to delete.
version_name (str ) – The name of the version to delete. If several versions with this name exist in this dataset, delete an arbitrary one.
force (bool ) – If
True
, delete even if version is published. Default:False
.delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. If multiple versions with the same
version_name
are found, this parameter is ignoredshow_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if
delete_sources
is False. Note that tqdm needs to be installed for this to work.
Return type
None
add_tags
add_tags(tags)
Add tags (short string) to classify the dataset. Old tags are not deleted
Parameters
tags (
Union
[str
,Sequence
[str
]]) – The tags to add to the datasetReturn type
None
remove_tags
remove_tags(tags=None)
Remove tags from the dataset
Parameters
tags (
Union
[str
,List
[str
],None
]) – The tags to remove from the dataset. If None (default), remove all tagsReturn type
None
get_dataset_webpage
get_dataset_webpage()
Return the Hyper Dataset’s web page address. For example: https://<your_web_server>/datasets/73757bd349634b86ae4b66ef5ed412df
Return type
str
Returns
http/s URL link