DataView

class dataview.DataView(name=None, iteration_order='sequential', iteration_infinite=False, maximum_number_of_frames=None, auto_connect_with_task=True, queries=None, **kwargs)

DataView object creates a single view over a mixture of dataset versions.

For example: balancing filtering rules (for every answer from one filter give me two answers from another). Additionally, augmentation instructions are sent together with data-iterator, to be executed in-flight. These gives us not only flexibility and reproducibility, but also enable us multiple training jobs with zero-code interference, for large scale hyper-parameter optimization.

Create a new DataView.

Parameters
- name (str ) – Name the current DataView. Important when multiple DataViews are used. For example one DataView name is ‘Test’ and the other ‘Validation’.
- iteration_order (IterationOrder ) – The order in which frames are iterated on with this DataView.
- iteration_infinite (bool ) – if True, dataview may return any frame more than once, up until maximum_number_of_frames are reached. If the total number of unique frames is less than maximum_number_of_frames, no duplicate frames will be returned.
  info
  Duplicate frames would still vary in augmentations, if used, due to the random nature of augmentation operation and parameter selection.
- maximum_number_of_frames (int ) – The maximum number of frames to be returned when iterating on this DataView.
- auto_connect_with_task (bool ) – If True the DataView will be automatically connected with the main Task context. Default True. Optional: readonly disabling DataView changes from the UI. This means that if the dataview was changed in the UI, it will have no effect on the code!
- Query ] ] queries (list [ Union [ FilterRule , ) – Initial queries (filter rules) for this DataView.

info

Dataview access is lazy, only when iterator is needed we actually check dataview validity

RoiQuery

class RoiQuery(label=None, count_range=None, conf_range=None, must_not=None)

A single query on the dataview.

Method generated by attrs for class DataView.RoiQuery.

Return type
None

label

label

The label of the ROI. Only ROIs with this label are matched.

Possible values are:

A single string- The ROI must have a label equals to this string
A sequence of strings - The ROI must have all the labels in the sequence.
A white-space separated list of labels as a single string - The ROI must have all the labels in the list.
A lucene query - The ROI must have labels that match the query.

Examples:

None or ‘*’ or ‘’: ROIs with any labels are matched.
‘cat’: selecting only frames with ROIs who’s labels contain the word ‘cat’.
‘cat AND dog’: selecting only frames with ROIs who’s labels contain both ‘cat’ and ‘dog’

Type
list[str] or str or None

count_range

count_range

A count constraint over the query.

If provided, limits query results to frames where the number of ROIs matching the query is within the given range.

Type
tuple(int, int)

conf_range

conf_range

A confidence filter over the query.

If provided, frames match the query only if they contain ROIs that both match RoiQuery.label and who’s confidence is within the given range.

Type
tuple(float, float)

must_not

must_not

If True negate the entire roi selection rule

If True, frames that do not match the query will be returned. :type: bool

is_none

is_none()

name

property name

Return type
str
Returns
Return DataView name (str)

DataView.get

classmethod get(dataview_id=None, dataview_name=None)

Get a previously defined dataview from the server.

Parameters
- dataview_id (str ) – The ID of the dataview.
- dataview_name (str ) – The name of the dataview.
Returns
A new Dataview object for the needed dataview
Return type
DataView

info

dataview_id and dataview_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

clone

clone()

Clone this dataview into a new one.

The new dataview is an sdk-only clone of this dataview, it has no representation in the backend.

Returns
The clone DataView object.
Return type
DataView

add_query

add_query(dataset_id=None, dataset_name=None, version_id=None, version_name=None, weight=1.0, roi_query=None, roi_count_range=None, roi_conf_range=None, roi_query_must_not=None, frame_query=None, source_query=None, query_object=None, dataset_project=None)

Add a new query to the dataview.

info

Dataview access is lazy, only when iterator is needed we actually create the dataview.

Parameters
- dataset_id (str ) – The ID of the dataset used as input for this query.
- dataset_name (str ) – The name of the dataset used as input for this query.
  info
  dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
- version_id (str ) – The ID of the version used as input for this query.
- version_name (str ) – The name of the version used as input for this query.
  danger
  Version names are not unique. The query is applied to a single, the last updated version will be selected!
  info
  version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
  info
  If dataset version is not specified, the last updated version in the dataset will be selected
- weight (float ) – Weight of the rule. Measured relative to other queries. For example, two queries with the same weight will cause the dataview to have the exact same number of frames from each query, regardless of the number of frames in the input version. This is crucial to remove inherent bias in datasets.
- str ] roi_query (Union [ list [ str ] , ) – List of labels or a string represent a single label or a lucene query. A frame will return from the query only if it contains an ROI with ALL the labels. if roi_query==DataView.EmptyRois , it means select only frames with no rois at all.
  info
  All the ROIs in the frame are returned even if only a single one matches the query.
  ROI is matched with the query if it has all the labels in the query.
  Example: [“black”, “cat”] will match [“black”, “occluded”, “cat”] but will not match [“black”, “dog”] or [“cat”] Example: DataView.EmptyRois will only match frames with zero rois Example: “car” will return all the frames with “car” label Example: Lucene query - “person OR car” will return all the frames with “car” or “person” label
- frame_query (str ) – Lucene query on the frame (can be mixed with roi query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(src:directory_name) AND (meta.key:my_value) AND (width:>50)’ will match any frame with src field containing ‘directory_name’, width over 50px and meta.key contains my_value.
  info
  A query such as ‘meta.cat_type:”white cat”’ will match frames in which the meta.cat_type field contains “white”, “cat” or both. In order to match frames in which meta.cat_type==”white cat”, use ‘meta.cat_type.keyword:”white cat”’.
- source_query (str ) – Lucene query on the frame sources (can be mixed with any other query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(sources.uri:directory_name) AND (sources.preview.uri:https*)’ Returns frames with sources uri with directory_name in the link and a preview link with https prefix
- roi_count_range (tuple ( int , int ) ) – (min, max) occurrences of the matched item.
- roi_conf_range (tuple ( float , float ) ) – (min, max) confidence of matched annotation must be in this range.
- roi_query_must_not (bool ) – if True negates the roi query, i.e. return only frames that do not answer the roi query terms Example: roi_query=[“black”, “cat”] roi_query_must_not=True, will match [“black”, “occluded”, “dog”] but will not match [“black”, “partial”, “cat”]
- query_object (Query ) – An instance of dataview.Query storing all the information on a specific rule If passing query_object, all other fields should be None, as they will be ignored.
- dataset_project (str ) – The project of the dataset used as input for this query.

add_multi_query

add_multi_query(dataset_id=None, dataset_name=None, version_id=None, version_name=None, weight=1.0, roi_queries=None, frame_query=None, source_query=None, dataset_project=None)

Add a new query with multiple label queries to the dataview.

info

Dataview access is lazy, only when iterator is needed we actually create the dataview.

Parameters
- dataset_id (str ) – The ID of the dataset used as input for this query.
- dataset_name (str ) – The name of the dataset used as input for this query.
  info
  dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
- version_id (str ) – The ID of the version used as input for this query.
- version_name (str ) – The name of the version used as input for this query.
  danger
  Version names are not unique. The query is applied to a single, the last updated version will be selected!
  info
  version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
  info
  If dataset version is not specified, the last updated version in the dataset will be selected
- weight (float ) – Weight of the rule. Measured relative to other queries. For example, two queries with the same weight will cause the dataview to have the exact same number of frames from each query, regardless of the number of frames in the input version. This is crucial to remove inherent bias in datasets.
- roi_queries (list [ RoiQuery or dict ] ) – A list of RoiQuery or dictionaries with ‘label’, ‘count_range’ and ‘conf_rang’ keys. Each item in the list is a query on the frame’s ROIs. Only frames that match all queries are returned by this dataview. if roi_queries==DataView.EmptyRois , it means select only frames with no rois at all.
- frame_query (str ) – Lucene query on the frame (can be mixed with roi query). Example: ‘(src:directory_name) AND (meta.key:my_value) AND (width:>50)’ will match any frame with src field containing ‘directory_name’, width over 50px and meta.key contains my_value.
  info
  A query such as ‘meta.cat_type:”white cat”’ will match frames in which the meta.cat_type field contains “white”, “cat” or both. In order to match frames in which meta.cat_type==”white cat”, use ‘meta.cat_type.keyword:”white cat”’.
- source_query (str ) – Lucene query on the frame sources (can be mixed with any other query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(sources.uri:directory_name) AND (sources.preview.uri:https*)’ Returns frames with sources uri with directory_name in the link and a preview link with https prefix
- dataset_project (str ) – The project of the dataset used as input for this query.

add_queries

add_queries(queries)

Add a list of queries to the dataview.

info

Dataview access is lazy, only when iterator is needed we actually create the dataview.

Parameters
queries (Query ) – List (or a single) of Query objects representing new queries to add.
Return type
None

get_queries

get_queries()

Get this dataview’s queries.

Usage example: Create a new DataView based on the returned queries.

dataview = DataView(queries=source_dataview.get_queries())

Returns
A list of this dataview’s queries.
Return type
list[Query]

create_version

create_version(version_name, description=None, dataset_id=None, dataset_name=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None, dataset_project=None)

Create a new version in a dataset with a specific name.

If a version by that name already exists and in draft mode (i.e. writable), return that one, unless raise_if_exists is True, than raise ValueError.

After the version is created, all the frames from the current DataView will be added to it

Parameters
- version_name (str ) – The name of the new version.
- description (Optional [ str ] ) – Description of the new dataset version
- dataset_id (Optional [ str ] ) – The ID of the dataset to create the version in.
- dataset_name (Optional [ str ] ) – The name of the dataset to create the version in.
- parent_version_ids (Optional [ List [ str ] ] ) – A list of the new version parents IDs. All IDs must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
- parent_version_names (Optional [ List [ str ] ] ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
- raise_if_exists (bool ) – If True and a version by name name already exists, raise ValueError. If False and a version by that name already exists, return it.
- auto_upload_destination (Optional [ str ] ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
- local_dataset_root_path (Optional [ Union [ str , Path ] ] ) – Required if auto_upload_destination is provided. It should point to the common folder for all local source files
- dataset_project (Optional [ str ] ) – The project of dataset to create the version in.
Return type
DatasetVersion
Returns
New DatasetVersion object representing the new version.

info

dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.

get_versions

get_versions()

Get a list of this dataview’s dataset versions.

The dataview’s versions are the versions specified in all the queries, and the versions registered as inputs to the dataview.

Return type
List[DatasetVersion]
Returns
A list of allegroai.DatasetVersion objects, one for each version of the dataview.

get_datasets

get_datasets()

Get a list of this dataview’s dataset versions. Equivalent to get_queries()

The dataview’s versions are the versions specified in all the queries, and the versions registered as inputs to the dataview.

Return type
List[DatasetVersion]
Returns
A list of allegroai.DatasetVersion objects, one for each version of the dataview.

add_mapping_rule

add_mapping_rule(from_labels, to_label, dataset_id=None, dataset_name=None, version_id=None, version_name=None, dataset_project=None)

Add new mapping to the dataview.

Mapping automatically converts label names to canonical names in the ROIs returned for frames while iterating over the dataview. This is used to make sure that different naming in different datasets will not produce two different classes for the same object.

Example: If one dataset has ROIs with the label ‘pedestrian’ and another has the ROIs with the label ‘person’, we can use both in a single dataview to create a person detector by adding mapping from ‘pedestrian’ to ‘person’

info

If this Dataview was not created from an existing dataview in the server, this function triggers the creation of such dataview.

info

Label mapping is performed after the frame is matched against the dataview’s queries. For that reason, the queries must be defined according to the dataset’s original labels.

Parameters
- dataset_id (str ) – The ID of the dataset to apply the mapping rule to.
- dataset_name (str ) – The name of the dataset to apply the mapping rule to.
  info
  dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
- version_id (str ) – The ID of the version to apply the mapping rule to.
- version_name (str ) – The name of the version to apply the mapping rule to.
  info
  version_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
  info
  If dataset version is not specified, the last updated version in the dataset will be selected
- from_labels (str or list [ str ] ) – Label or a list of labels to map to to_label. The ROI must match to all of the labels for the mapping to take place.
- dataset_project (str ) – The project of the dataset to apply the mapping rule to.
- to_label (str ) – Label to change from_labels to.
Return type
()

set_labels

set_labels(label_dict, *label_dicts, force=False, labels)**

Set dataview label enumeration.

Label enumeration maps label strings to integers, for later use within the network.

Parameters
- label_dict (Mapping [ str: int ] ) – Mapping from a label string to its integer representation in the network. e.g. {‘cat’: 0, ‘dog’: 1, ‘hound’: 1}.
- label_dicts (Mapping [ str: int ] ) – Deprecated, use label_dict instead
- force (bool ) – If True, set the labels even when the task is running remotely.
- labels (int ) – Deprecated, use label_dict instead
Return type
None

add_augmentation_affine

add_augmentation_affine(operations=('bypass', 'scale', 'rotate', 'shear', 'reflect-horiz', 'reflect-vert'), strength=1.0)

Add affine based augmentation instruction to the dataview.

Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)

The actual execution of the augmentation is performed by the worker, while getting the data from the frame using ImageFrame

Parameters
- operations (Sequence [ Augmentation.Affine ] ) – A sequence of affine operations. One will be picked randomly and uniformly.
- strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
Returns
True if augmentation was added to the dataview.
Return type
bool

add_augmentation_pixel

add_augmentation_pixel(operations=('bypass', 'blur', 'noise', 'recolor'), strength=1.0)

Add pixel based augmentation instruction to the dataview.

Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)

The actual execution of the augmentation is performed by the worker, while getting the data from the frame using ImageFrame

Parameters
- operations (Sequence [ Augmentation.Pixel ] ) – A sequence of pixel operations. One will be picked randomly and uniformly.
- strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
Returns
True if augmentation was added to the dataview.
Return type
bool

add_augmentation_custom

add_augmentation_custom(operations, strength=1.0, arguments=None)

Add custom augmentation instruction to the dataview.

Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)

The actual execution of the augmentation is performed by the worker, while getting the data from the frame using ImageFrame

Parameters
- operations (Sequence ( str ) ** ) – A sequence of custom operation names. Each has to be registered with ImageFrame.register_custom_augmentation. One will be picked randomly and uniformly.
- strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
- arguments (Mapping ( str: Mapping ( str: Any ) ) ) – Arguments for custom operations. For each operation in operations there may be an entry in arguments with key that equals to the operation name. The value of the entry is a mapping from name to any JSON-able value. The entry of the selected operation is returned as-is from the server, i.e. it will not be randomly chosen. This mapping is later accessible from the augmentation code when it is executed.
```
from allegroai import ImageFrame, DataView
from my_augmentation_library import MyCustomAugmentation, MyCoolAugmentation

ImageFrame.register_custom_augmentation('custom_aug', MyCustomAugmentation)
ImageFrame.register_custom_augmentation('cool_aug', MyCoolAugmentation)

aug_arguments = {
    'custom_aug': {
        'color': 'blue',
        'count': 42,
    },
    'cool_aug': {
        'vector': [1, 2, 5],
        'verify: True,
    },
}

dv = DataView()
dv.add_augmentation_custom(
    operations=('custom_aug', 'cool_aug'),
    arguments=aug_arguments,
)
```
Returns
True if augmentation was added to the dataview.
Return type
bool

set_iteration_parameters

set_iteration_parameters(order=None, infinite=None, maximum_number_of_frames=None, random_seed=None)

Set dataview general iteration parameters.

Parameters
- order (IterationOrder ) – The order in which frames are iterated on with this DataView.
  None means not-applicable (ignored)
- infinite (bool ) – if True, dataview infinitely returns frames (with duplicates, of course).
  None means not-applicable (ignored)
  info
  Duplicate frames would still vary in augmentations, if used, due to the random nature of augmentation operation and parameter selection.
- maximum_number_of_frames (int or None ) – Limit the total number of frames the dataview returns Note: Zero or Negative values are ‘unlimited’ number of frames
  None means not-applicable (ignored)
- random_seed (int or None ) – Random seed for any randomness needed (e.g. order, augmentation, etc.). Default random seed is fixed for easy reproducibility.
  None means not-applicable (ignored)
Returns
True if at least one of the specified dataview iteration parameters was set
Return type
bool

get_iteration_parameters

get_iteration_parameters()

Get dataview iteration parameters (includes general and video iteration parameters)

Returns
A dictionary specifying iteration parameters
Return type
dict

set_video_parameters

set_video_parameters(minimum_time_between_consecutive_frames=0, sequence_minimum_time=0)

Set dataview video specific iteration parameters.

These settings are only relevant to video content (frames sharing the same source uri with different timestamps).

danger

This method overrides any unset argument to default.

Parameters
- minimum_time_between_consecutive_frames (int ) – When the frames contain a positive timestamp (i.e. video), make sure two consecutive frames are at least minimum_time_between_consecutive_frames in milliseconds apart.
- sequence_minimum_time (int ) – When the frame contains a positive timestamp (i.e. videos), expand the selected frame (based on the filters) with enough frames so that we end up with a sequence of at least sequence_minimum_time length (in timebase/milliseconds)
Returns
True if at least one of the specified dataview iteration parameters was set
Return type
bool

set_random_seed

set_random_seed(random_seed)

Set the random seed for this dataview.

The random seed is always fixed so the entire run is reproducible. The default random seed is fixed to 1337.

Parameters
random_seed (int ) – Random seed for any randomness needed (e.g. order, augmentation, etc.).
Return type
None

get_random_seed

get_random_seed()

Get the random seed for this dataview.

The random seed is always fixed so the entire run is reproducible. The default random seed is fixed to 1337.

Return int
Random seed for any randomness needed (e.g. order, augmentation, etc.).
Return type
int

to_dict

to_dict(projection=None)

Gets the DataView as a list of dictionaries. Each dictionary in the list represents a frame. Warning: This function pulls all the frames from the backend at the same time. It is recommended to use DataView.get_iterator(projection=projection or [‘*’]) when possible, to avoid this.

Example:

dataview.to_dict(projection=['id', 'dataset.id', 'sources'])
# will return a list containing dictionaries with the following fields:
# [
#   {
#     'id': '514504adbb6a91620eefa3e21ecfcc31',
#     'dataset': {
#       'id': 'df3638ec95454589bf86ba97f344f697'
#     },
#     'sources': [
#       {
#         'id': 'Frame',
#         'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
#         'timestamp': 0,
#         'preview': {
#           'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
#           'timestamp': 0
#         }
#       }
#     ]
#   },
#   some other dictionaries with the same format here
# ]

Parameters
projection (Optional [ Sequence [ str ] ] ) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. To see supported fields for projection, see the schema at backend_api.services.frames.Frame
Return type
List[dict[str, str]]
Returns
A list of dictionaries representing each frame

get_iterator

get_iterator(query_cache_size=None, query_queue_depth=None, allow_repetition=False, projection=None, kwargs)**

Get an iterator for this DataView.

The iterator will yield frames from the dataview according to its queries (see add_query and add_multi_query), and its iteration parameters (see set_iteration_parameters and set_video_parameters).

The yielded frames are only the frame’s metadata. If DataView is image-based, you can wrap the SingleFrame/FrameGroup with ImageFrame for builtin augmentation support.

info

Every function call will create a new iterator for the DataView!

info

Iterator length will return the expected number of frames based on the specific queries

If DataView infinite flag is set, len(iterator) will return 2^32

If the maximum_number_of_frames was set, len(iterator) will return maximum_number_of_frames

If allow_repetitions is True, a limit to the maximum returned frames will be calculated automatically. This limit can be thought of as an entire epoch, as it guarantees that we cover all unique frames in each rule (that said, in some rules we might have repetition as part of the rule ratio balance). len(iterator) will return this synthetic epoch limit, and the Iterator will raise StopIteration when reaching this limit (as expected by an iterator)

Parameters
- query_cache_size (int ) – The requested number of metadata frames in every API call to the server. A large value is slower per request, but faster in average for frames returned by the iterator.
- query_queue_depth (int ) – Number of API request results to store in the return queue. The maximum total number of metadata frames stored in the return queue is query_cache_size * query_queue_depth.
- allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
- projection (Optional [ Sequence [ str ] ] ) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. To see supported fields for projection, see the schema at backend_api.services.frames.Frame. If this argument is set, the values the iterator returns are dictionaries representing each frame
  For example:
```
dataview.get_iterator(projection=['id', 'dataset.id', 'sources'])
# will return an iterator that yields dictionaries with the following fields:
#  {
#    'id': '514504adbb6a91620eefa3e21ecfcc31',
#    'dataset': {
#      'id': 'df3638ec95454589bf86ba97f344f697'
#    },
#    'sources': [
#      {
#        'id': 'Frame',
#        'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
#        'timestamp': 0,
#        'preview': {
#          'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
#          'timestamp': 0
#        }
#      }
#    ]
#  }
```
- kwargs (int ) –
Returns
An iterator over the DataView’s frames.
Return type
Generator((SingleFrame, FrameGroup, dict))

to_list

to_list(allow_repetition=False, auto_synthetic_epoch_limit=None)

Get a list of frames for this DataView.

The returned list will hold frames from the DataView according to its queries (see add_query and add_multi_query), and its iteration parameters (see set_iteration_parameters and set_video_parameters).

The yielded frames are only the frame’s metadata (SingleFrame/FrameGroup).

info

Every function call will create a new list of frames for the DataView!

Parameters
- allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
- auto_synthetic_epoch_limit (Optional[int]) – deprecated, use allow_repetition instead
Returns
A list of SingleFrame, FrameGroup generated from the DataView’s query.
Return type
list((SingleFrame, FrameGroup))

split_to_lists

split_to_lists(ratio, allow_repetition=False, seed=42, frame_id_fn=None)

Partition the frames represented by the DataView into non-overlapping lists (e.g. train/validation/test) according to the split weights

It is guaranteed that there is no overlap between the lists.
It is guaranteed frames partitioning is based solely on weights and seed, and is independent of the DataView query, this means partitions are consistent.

The resulting frames are uniformly split based on the frame_stringify_fn(frame) returned identifier combined with the seed number.

The yielded frames are only the frame’s metadata (SingleFrame/FrameGroup).

info

Every function call will create a new list of frames for the DataView!

Parameters
- ratio (list ( int ) ) – List of weights (integer) for the partitions. The split is based on the ratio between the weight ot the total sum of weights. For example: a train/val/test split of 60% / 20%/ 20% is achieved with partition_weights=[3, 1, 1] and will result in three partitions, the first will have 3/5th of the DataView’s frames, the second 1/5th, and the third the last 1/5th of the frames in the DataView.
- allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
- seed (int ) – Random number to add to the frame identifier, controlling the randomness in the selection criteria of the frame partitioning
- frame_id_fn (lambda ) – User-provided frame identifier function used for consistent partitioning of the DataView frames. Default is (lambda frame: frame.id)
Returns
A list of partitions, where each partition is a list of SingleFrame/FrameGroup generated from the DataView’s query.
Return type
list(list((SingleFrame, FrameGroup)))

to_json

to_json(json_file, allow_repetition=False, auto_synthetic_epoch_limit=None)

Store DataView data to json file. Stored DataView frames contain only the metadata.

Parameters
- json_file (Union[str, BytesIO]) – Path (str) or BytesIO object, to store the DataView data to (in JSON format).
- allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
- auto_synthetic_epoch_limit (Optional[int]) – deprecated, use allow_repetition instead
Return type
bool
Returns
True if successful

from_json_to_list

from_json_to_list(json_file)

Load DataView data from a json file. The returned frames are only the frame’s metadata and of type SingleFrame/FrameGroup.

Parameters
json_file (Union[str, BytesIO]) – Path (str) or BytesIO object, to load the DataView data from (in JSON format).
Returns
A list of SingleFrame, FrameGroup generated from JSON file DataView data.
Return type
list((SingleFrame, FrameGroup))

prefetch_files

prefetch_files(num_workers=None, wait=False, query_cache_size=None, get_previews=False, get_masks=True)

Prefetch the DataView’s files (files/uri pointed by frames in the DataView). e.g. call SingleFrame.get_local_source() .get_local_preview() .get_local_mask() on all the frames in the DataView Pre-fetching is done in the background, and the function call returns immediately

Parameters
- num_workers (Optional[int]) – None (default), number of workers is set to cpu count
- wait (bool) – if True return after all files were prefetched to a local storage
- query_cache_size (Optional[int]) – The requested number of metadata frames in every API call to the server.
- get_previews (Optional[bool]) – Prefetch preview files (default False)
- get_masks (Optional[bool]) – Prefetch masks (default True)
Return type
bool
Returns
True if pre-fetching started

get_mapping_rules

get_mapping_rules()

Get current dataview mappings

Note: if dataview was created from existing dataview, this function triggers the creation of the dataview

Return type
List[MappingRule]
Returns
list of mapping objects

get_labels

get_labels()

Get dataview label enumeration (i.e. dictionary of label string to id)

example: {‘person’: 1, ‘background’: 0, ‘pedestrian’: 1} Notice: every function call will create a new iterator for the dataview!

Return type
Dict[str, int]
Returns
dictionary of string to integer

store

store(project_name=None, name=None, description=None, tags=None)

Stores the current dataview in the system for future use

Parameters
- project_name (Optional[str]) – project name (string)
- name (Optional[str]) – dataview name (string)
- description (Optional[str]) – dataview description (strung)
- tags (Optional[List[str]]) – list tags (strings) for this dataview
Return type
None

connect

connect(task)

Connect current dataview with a specific task

When running in debug mode (i.e. locally), the task is updated with the dataview object When running remotely (i.e. from a daemon) the dataview is being updated from the task Notice! when running remotely the dataview is ignored and loaded from the task object regardless of the code

Parameters
task (Task ) – Task object
Return type
None

:raise ValueError exception if DataView object is already connected to other task

Return type
None
Parameters
task (Task ) –

update_task

update_task(task)

Set this dataview’s contents into a task. This action does not set a reference to the provided data view, but rather copies all data into the appropriate internal structure maintained by the task object and updates task information in the backend using the API. This function will set the dataview’s contents even when running remotely. Note that attempting to update a task that the DataView is already connected to will raise a ValueError.

Parameters
task (Task ) – The task to set the content into
Return type
None

get_id

get_id()

return the dataview ID in the system (for future use)

Return type
str
Returns
dataview ID (string)

has_id

has_id()

Return type
bool
Returns
True iff the data view is stored under an ID in the server.

get_count

get_count()

Returns the total number of unique frames returned by this dataview, as well as the number of unique frames returned from each of this dataview’s queries (rules)

Return type
Tuple[int, Sequence[int]]

get_sources

get_sources()

Returns a list of sources URI links from all the frames in the current DataView :rtype: Sequence[str] :returns: list of URI strings

Return type
Sequence[str]

class dataview.DataView(name=None, iteration_order='sequential', iteration_infinite=False, maximum_number_of_frames=None, auto_connect_with_task=True, queries=None, **kwargs)​

RoiQuery​

label​

count_range​

conf_range​

must_not​

is_none​

name​

DataView.get​

clone​

add_query​

add_multi_query​

add_queries​

get_queries​

create_version​

get_versions​

get_datasets​

add_mapping_rule​

set_labels​

add_augmentation_affine​

add_augmentation_pixel​

add_augmentation_custom​

set_iteration_parameters​

get_iteration_parameters​

set_video_parameters​

set_random_seed​

get_random_seed​

to_dict​

get_iterator​

to_list​

split_to_lists​

to_json​

from_json_to_list​

prefetch_files​

get_mapping_rules​

get_labels​

store​

connect​

update_task​

get_id​

has_id​

get_count​

get_sources​

class dataview.DataView(name=None, iteration_order='sequential', iteration_infinite=False, maximum_number_of_frames=None, auto_connect_with_task=True, queries=None, **kwargs)

RoiQuery

label

count_range

conf_range

must_not

is_none

name

DataView.get

clone

add_query

add_multi_query

add_queries

get_queries

create_version

get_versions

get_datasets

add_mapping_rule

set_labels

add_augmentation_affine

add_augmentation_pixel

add_augmentation_custom

set_iteration_parameters

get_iteration_parameters

set_video_parameters

set_random_seed

get_random_seed

to_dict

get_iterator

to_list

split_to_lists

to_json

from_json_to_list

prefetch_files

get_mapping_rules

get_labels

store

connect

update_task

get_id

has_id

get_count

get_sources