Skip to main content

Dataset

class datasetversion.Dataset()

A dataset representation.

Used to manage a dataset and it’s versions

danger

Do not instantiate directly. Use Dataset.get or Dataset.create methods instead.


id

property id

The dataset’s id.

  • Return type

    str


name

property name

The dataset’s name.

  • Return type

    str


Dataset.create

classmethod create(dataset_name, comment=None, tags=None, raise_if_exists=False, dataset_project=None)

Create a new dataset in the system and return a Dataset object for it.

  • Parameters

    • dataset_name (str ) – The name of the new dataset.

    • comment (str ) – A free text to describe the dataset

    • tags (list ) – A list of tags (short strings) to classify the dataset. If the dataset already exists, these tags will be added to its list of tags.

    • raise_if_exists (bool ) – If False (the default) and there is a dataset with the name :paramref:`~.create.dataset_name`, return the existing Dataset. If True and there is a dataset with the name :paramref:`~.create.dataset_name`, raise ValueError exception.

    • dataset_project (str ) – A project name for the newly created dataset.

  • Return type

    ForwardRef

  • Returns

    A new Dataset object for the newly created dataset.


Dataset.get

classmethod get(dataset_id=None, dataset_name=None)

Return a Dataset object for an existing dataset.

  • Parameters

    • dataset_id (Optional[str]) – The ID of the dataset.

    • dataset_name (Optional[str]) – The name of the dataset.

  • Return type

    Dataset

info

:paramref:~.Dataset.get.dataset_id and :paramref:~.Dataset.get.dataset_name are mutually exclusive, setting both to non-None values will raise a UsageError exception.

  • Return type

    ForwardRef

  • Returns

    A new Dataset object for the dataset. If :paramref:`~.Dataset.get.dataset_name` is set and there are several datasets with that name, return an arbitrary one.

  • Parameters

    • dataset_id (Optional [ str ] ) –

    • dataset_name (Optional [ str ] ) –


Dataset.delete

classmethod delete(dataset_id=None, dataset_name=None, delete_all_versions=False, force=False, delete_sources=False, show_progress=True)

Delete a dataset from the system

If several datasets with the name dataset_name exists, delete an arbitrary one. Notice that delete_sources has no effect in this case.

  • Parameters

    • dataset_id (str ) – The ID of the dataset.

    • dataset_name (str ) – The name of the dataset.

    • delete_all_versions (bool ) – If True, delete the dataset with all of it’s versions. If False, expect the dataset to have no versions. If there are raise an exception. Default: False.

    • force (bool ) – If True, and :paramref:`~.delete.delete_all_versions` is True, delete also published versions. If False, and :paramref:`~.delete.delete_all_versions` is True, raise an exception if there is a published version in the dataset. If :paramref:`~.delete.delete_all_versions` is False, this has no effect. Default: False

    • delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. This parameter is ignored if delete_all_versions is False.

    • show_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if delete_sources is False. Note that tqdm needs to be installed for this to work.

  • Return type

    None

info

:paramref:~.delete.dataset_id and :paramref:~.delete.dataset_name are mutually exclusive, setting both to non-None values will raise a UsageError exception.

  • Return type

    None

  • Parameters

    • dataset_id (Optional [ str ] ) –

    • dataset_name (Optional [ str ] ) –

    • delete_all_versions (bool ) –

    • force (bool ) –

    • delete_sources (bool ) –

    • show_progress (bool ) –


create_version

create_version(version_name, description=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None)

Create and return a new DatasetVersion for this Dataset.

  • Parameters

    • version_name (str ) – The new version name.

    • description (str ) – A free text to describe the version.

    • parent_version_ids (list ) – A list of the new version parents IDs. All ID’s must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.

    • parent_version_names (list ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.

    • raise_if_exists (bool ) – If False (the default) and a version with the name :paramref:`~.create_version.version_name` exists in this dataset, return that version. If True, raise a ValueError exception.

    • auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.

    • Path ] local_dataset_root_path (Union [ str , ) – Required if auto_upload_destination is provided. It should point to the common folder for all local source files

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –

  • Return type

    DatasetVersion

info

:paramref:~.create_version.parent_version_ids and :paramref:~.create_version.parent_version_names are mutually exclusive, setting both to non-None values will raise a UsageError exception.

  • Return type

    ForwardRef

  • Returns

    A new DatasetVersion object with the name :paramref:`~.create_version.version_name` in this Dataset.

  • Parameters

    • version_name (str ) –

    • description (Optional [ str ] ) –

    • parent_version_ids (Optional [ List [ str ] ] ) –

    • parent_version_names (Optional [ List [ str ] ] ) –

    • raise_if_exists (bool ) –

    • auto_upload_destination (Optional [ str ] ) –

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –


get_version

get_version(version_id=None, version_name=None, auto_upload_destination=None, local_dataset_root_path=None, raise_on_multiple=False)

Return a DatasetVersion object of a version in this dataset.

  • Parameters

    • version_id (str ) – The id of the version to get.

    • version_name (str ) – The name of the version to get. If several versions exist with that name return an arbitrary one.

    • auto_upload_destination (str ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.

    • Path ] local_dataset_root_path (Union [ str , ) – Required if auto_upload_destination is provided. It should point to the common folder for all local source files

    • raise_on_multiple (bool ) – Raise error if multiple versions are found

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –

  • Return type

    allegroai.datasetversion.DatasetVersion

info

:paramref:~.get_version.version_id and :paramref:~.get_version.version_name are mutually exclusive, setting both to non-None values will raise a UsageError exception.

  • Return type

    DatasetVersion

  • Returns

    A DatasetVersion object of the desired version from this dataset.

  • Parameters

    • version_id (Optional [ str ] ) –

    • version_name (Optional [ str ] ) –

    • auto_upload_destination (Optional [ str ] ) –

    • local_dataset_root_path (Optional [ Union [ str , pathlib2.Path ] ] ) –

    • raise_on_multiple (bool ) –


get_versions

get_versions(only_published=False)

Return a list of all the versions of a Dataset

  • Parameters

    only_published (bool ) – If True, return only published versions. If False, return all versions.

  • Return type

    List[DatasetVersion]

  • Returns

    A list of DatasetVersion objects for all the versions in this dataset.


delete_version

delete_version(version_id=None, version_name=None, force=False, delete_sources=False, show_progress=True)

Delete a version from this dataset.

  • Parameters

    • version_id (str ) – The id of the version to delete.

    • version_name (str ) – The name of the version to delete. If several versions with this name exists in this dataset, delete an arbitrary one.

    • force (bool ) – If True, delete even if version is published. Default: False.

    • delete_sources (bool ) – Delete sources associated with the deleted frames in the dataset. Supported source locations are: s3, gs and azure. In case a connection cannot be established with the cloud provider or a source deletion failed, the operation will abort. If multiple versions with the same version_name are found, this parameter is ignored

    • show_progress (bool ) – If True, show progress bar when deleting sources. If False, disable the progress bar. This parameter is ignored if delete_sources is False. Note that tqdm needs to be installed for this to work.

  • Return type

    None

info

:paramref:~.get_version.version_id and :paramref:~.get_version.version_name are mutually exclusive, setting both to non-None values will raise a UsageError exception.

  • Return type

    None

  • Parameters

    • version_id (Optional [ str ] ) –

    • version_name (Optional [ str ] ) –

    • force (bool ) –

    • delete_sources (bool ) –

    • show_progress (bool ) –


add_tags

add_tags(tags)

Add tags (short string) to classify the dataset. Old tags are not deleted

  • Parameters

    tags (Union[str, Sequence[str]]) – The tags to add to the dataset

  • Return type

    None


remove_tags

remove_tags(tags)

Remove tags from the dataset

  • Parameters

    tags (Union[str, List[str]]) – The tags to remove from the dataset

  • Return type

    None


get_dataset_webpage

get_dataset_webpage()

Return the Hyper Dataset’s web page address. For example: https://<your_web_server>/datasets/73757bd349634b86ae4b66ef5ed412df

  • Return type

    str

  • Returns

    http/s URL link