Skip to main content

Dataset

class datasetversion.Dataset()

A dataset representation.

Used to manage a dataset and its versions

danger

Do not instantiate directly. Use Dataset.get or Dataset.create methods instead.


id

property id

The dataset’s id.

  • Return type

    str


name

property name

The dataset’s name.

  • Return type

    str


project

property project

The dataset’s project id. May be None if the project wasn’t specified when creating the dataset

  • Return type

    Optional[str]


Dataset.create

classmethod create(dataset_name, comment=None, tags=None, raise_if_exists=False, dataset_project=None)

Create a new dataset in the system and return a Dataset object for it.

  • Parameters

    • dataset_name (str ) – The name of the new dataset.

    • comment (str ) – A free text to describe the dataset

    • tags (list ) – A list of tags (short strings) to classify the dataset. If the dataset already exists, these tags will be added to its list of tags.

    • raise_if_exists (bool ) – If False (the default) and there is a dataset with the name dataset_name, return the existing Dataset. If True and there is a dataset with the name dataset_name, raise ValueError exception.

    • dataset_project (str ) – A project name for the newly created dataset.

  • Return type

    ForwardRef

  • Returns

    A new Dataset object for the newly created dataset.


Dataset.get

classmethod get(dataset_id=None, dataset_name=None, dataset_project=None)

Return a Dataset object for an existing dataset.

  • Parameters

    • dataset_id (Optional[str]) – The ID of the dataset

    • dataset_name (Optional[str]) – The name of the dataset.

    • dataset_project (Optional[str]) – The project of the dataset.

      info

      dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

  • Return type

    ForwardRef

  • Returns

    A new Dataset object for the dataset. If dataset_name is set and there are several datasets with that name, return an arbitrary one.


Dataset.delete

classmethod delete(dataset_id=None, dataset_name=None, delete_all_versions=False, force=False, delete_sources=False, show_progress=True, dataset_project=None)

Delete a dataset from the system

If several datasets with the name dataset_name exist, delete an arbitrary one. Notice that delete_sources has no effect in this case.

info

dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

  • Parameters

    • dataset_id (str ) – The ID of the dataset.

    • dataset_name (str ) – The name of the dataset.

    • delete_all_versions (bool ) – If True, delete the dataset with all of its versions. If False, expect the dataset to have no versions. If there are, raise an exception. Default: False.

    • force (bool ) – If True, and delete_all_versions is True, delete also published versions. If False, and delete_all_versions is True, raise an exception if there is a published version in the dataset. If delete_all_versions is False, this has no effect. Default: False

    • delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. This parameter is ignored if delete_all_versions is False.

    • show_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if delete_sources is False. Note that tqdm needs to be installed for this to work.

    • dataset_project (str ) – The project name of the dataset.

  • Return type

    None


create_version

create_version(version_name, description=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None)

Create and return a new DatasetVersion for this Dataset.

info

parent_version_ids and parent_version_names are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

  • Parameters

    • version_name (str ) – The new version name.

    • description (str ) – A free text to describe the version.

    • parent_version_ids (list ) – A list of the new version parents IDs. All IDs must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.

    • parent_version_names (list ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.

    • raise_if_exists (bool ) – If False (the default) and a version with the name version_name exists in this dataset, return that version. If True, raise a ValueError exception.

    • auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.

    • Path ] local_dataset_root_path (Union [ str , ) – Required if auto_upload_destination is provided. It should point to the common folder for all local source files

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –

  • Return type

    ForwardRef

  • Returns

    A new DatasetVersion object with the name version_name in this Dataset.


get_version

get_version(version_id=None, version_name=None, auto_upload_destination=None, local_dataset_root_path=None, raise_on_multiple=False)

Return a DatasetVersion object of a version in this dataset.

info

version_id and version_name are mutually exclusive. setting both to non-None values will raise a UsageError exception.

  • Parameters

    • version_id (str ) – The ID of the version to get.

    • version_name (str ) – The name of the version to get. If several versions exist with that name, return an arbitrary one.

    • auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.

    • Path ] local_dataset_root_path (Union [ str , ) – Required if auto_upload_destination is provided. It should point to the common folder for all local source files

    • raise_on_multiple (bool ) – Raise error if multiple versions are found

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –

  • Return type

    DatasetVersion

  • Returns

    A DatasetVersion object of the desired version from this dataset.


get_versions

get_versions(only_published=False)

Return a list of all the versions of a Dataset

  • Parameters

    only_published (bool ) – If True, return only published versions. If False, return all versions.

  • Return type

    List[DatasetVersion]

  • Returns

    A list of DatasetVersion objects for all the versions in this dataset.


delete_version

delete_version(version_id=None, version_name=None, force=False, delete_sources=False, show_progress=True)

Delete a version from this dataset.

info

version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

  • Parameters

    • version_id (str ) – The ID of the version to delete.

    • version_name (str ) – The name of the version to delete. If several versions with this name exist in this dataset, delete an arbitrary one.

    • force (bool ) – If True, delete even if version is published. Default: False.

    • delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. If multiple versions with the same version_name are found, this parameter is ignored

    • show_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if delete_sources is False. Note that tqdm needs to be installed for this to work.

  • Return type

    None


add_tags

add_tags(tags)

Add tags (short string) to classify the dataset. Old tags are not deleted

  • Parameters

    tags (Union[str, Sequence[str]]) – The tags to add to the dataset

  • Return type

    None


remove_tags

remove_tags(tags=None)

Remove tags from the dataset

  • Parameters

    tags (Union[str, List[str], None]) – The tags to remove from the dataset. If None (default), remove all tags

  • Return type

    None


get_dataset_webpage

get_dataset_webpage()

Return the Hyper Dataset’s web page address. For example: https://<your_web_server>/datasets/73757bd349634b86ae4b66ef5ed412df

  • Return type

    str

  • Returns

    http/s URL link