DataView
class dataview.DataView(name=None, iteration_order='sequential', iteration_infinite=False, maximum_number_of_frames=None, auto_connect_with_task=True, queries=None, **kwargs)
DataView
object creates a single view over a mixture of dataset versions.
For example: balancing filtering rules (for every answer from one filter give me two answers from another). Additionally, augmentation instructions are sent together with data-iterator, to be executed in-flight. These gives us not only flexibility and reproducibility, but also enable us multiple training jobs with zero-code interference, for large scale hyper-parameter optimization.
Create a new DataView.
Parameters
name (str ) – Name the current DataView. Important when multiple DataViews are used. For example one DataView name is ‘Test’ and the other ‘Validation’.
iteration_order (IterationOrder ) – The order in which frames are iterated on with this
DataView
.iteration_infinite (bool ) – if
True
, dataview may return any frame more than once, up untilmaximum_number_of_frames
are reached. If the total number of unique frames is less than maximum_number_of_frames, no duplicate frames will be returned.infoDuplicate frames would still vary in augmentations, if used, due to the random nature of augmentation operation and parameter selection.
maximum_number_of_frames (int ) – The maximum number of frames to be returned when iterating on this
DataView
.auto_connect_with_task (bool ) – If True the DataView will be automatically connected with the main Task context. Default True. Optional: readonly disabling DataView changes from the UI. This means that if the dataview was changed in the UI, it will have no effect on the code!
queries (list [ FilterRule ] ) – Initial queries (filter rules) for this DataView.
Dataview access is lazy, only when iterator is needed we actually check dataview validity
RoiQuery
class RoiQuery(label=None, count_range=None, conf_range=None, must_not=None)
A single query on the dataview.
Method generated by attrs for class DataView.RoiQuery.
Return type
None
label
label
The label of the ROI. Only ROIs with this label are matched.
Possible values are:
A single string- The ROI must have a label equals to this string
A sequence of strings - The ROI must have all the labels in the sequence.
A white-space separated list of labels as a single string - The ROI must have all the labels in the list.
A lucene query - The ROI must have labels that match the query.
Examples:
None or ‘*’ or ‘’: ROIs with any labels are matched.
‘cat’: selecting only frames with ROIs who’s labels contain the word ‘cat’.
‘cat AND dog’: selecting only frames with ROIs who’s labels contain both ‘cat’ and ‘dog’
Type
list[str] or str or None
count_range
count_range
A count constraint over the query.
If provided, limits query results to frames where the number of ROIs matching the query is within the given range.
Type
tuple(int, int)
conf_range
conf_range
A confidence filter over the query.
If provided, frames match the query only if they contain
ROIs that both match RoiQuery.label
and who’s confidence
is within the given range.
Type
tuple(float, float)
must_not
must_not
If True negate the entire roi selection rule
If True, frames that do not match the query will be returned. :type: bool
is_none
is_none()
name
property name
Return type
str
Returns
Return DataView name (str)
DataView.get
classmethod get(dataview_id=None, dataview_name=None)
Get a previously defined dataview from the server.
Parameters
dataview_id (str ) – The ID of the dataview.
dataview_name (str ) – The name of the dataview.
Returns
A new Dataview object for the needed dataview
Return type
DataView
dataview_id and dataview_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
clone
clone()
Clone this dataview into a new one.
The new dataview is an sdk-only clone of this dataview, it has no representation in the backend.
Returns
The clone
DataView
object.Return type
DataView
add_query
add_query(dataset_id=None, dataset_name=None, version_id=None, version_name=None, weight=1.0, roi_query=None, roi_count_range=None, roi_conf_range=None, roi_query_must_not=None, frame_query=None, source_query=None, query_object=None, dataset_project=None)
Add a new query to the dataview.
Dataview access is lazy, only when iterator is needed we actually create the dataview.
Parameters
dataset_id (str ) – The ID of the dataset used as input for this query.
dataset_name (str ) – The name of the dataset used as input for this query.
infodataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
version_id (str ) – The ID of the version used as input for this query.
version_name (str ) – The name of the version used as input for this query.
dangerVersion names are not unique. The query is applied to a single, the last updated version will be selected!
infoversion_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
infoIf dataset version is not specified, the last updated version in the dataset will be selected
weight (float ) – Weight of the rule. Measured relative to other queries. For example, two queries with the same weight will cause the dataview to have the exact same number of frames from each query, regardless of the number of frames in the input version. This is crucial to remove inherent bias in datasets.
str ] roi_query (Union [ list [ str ] , ) – List of labels or a string represent a single label or a lucene query. A frame will return from the query only if it contains an ROI with ALL the labels. if roi_query==DataView.EmptyRois , it means select only frames with no rois at all.
infoAll the ROIs in the frame are returned even if only a single one matches the query.
ROI is matched with the query if it has all the labels in the query.
Example: [“black”, “cat”] will match [“black”, “occluded”, “cat”] but will not match [“black”, “dog”] or [“cat”] Example: DataView.EmptyRois will only match frames with zero rois Example: “car” will return all the frames with “car” label Example: Lucene query - “person OR car” will return all the frames with “car” or “person” label
frame_query (str ) – Lucene query on the frame (can be mixed with roi query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(src:directory_name) AND (meta.key:my_value) AND (width:>50)’ will match any frame with src field containing ‘directory_name’, width over 50px and meta.key contains my_value.
infoA query such as ‘meta.cat_type:”white cat”’ will match frames in which the meta.cat_type field contains “white”, “cat” or both. In order to match frames in which meta.cat_type==”white cat”, use ‘meta.cat_type.keyword:”white cat”’.
source_query (str ) – Lucene query on the frame sources (can be mixed with any other query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(sources.uri:directory_name) AND (sources.preview.uri:https*)’ Returns frames with sources uri with directory_name in the link and a preview link with https prefix
roi_count_range (tuple ( int , int ) ) – (min, max) occurrences of the matched item.
roi_conf_range (tuple ( float , float ) ) – (min, max) confidence of matched annotation must be in this range.
roi_query_must_not (bool ) – if True negates the roi query, i.e. return only frames that do not answer the roi query terms Example: roi_query=[“black”, “cat”] roi_query_must_not=True, will match [“black”, “occluded”, “dog”] but will not match [“black”, “partial”, “cat”]
query_object (Query ) – An instance of dataview.Query storing all the information on a specific rule If passing query_object, all other fields should be None, as they will be ignored.
dataset_project (str ) – The project of the dataset used as input for this query.
add_multi_query
add_multi_query(dataset_id=None, dataset_name=None, version_id=None, version_name=None, weight=1.0, roi_queries=None, frame_query=None, source_query=None, dataset_project=None)
Add a new query with multiple label queries to the dataview.
Dataview access is lazy, only when iterator is needed we actually create the dataview.
Parameters
dataset_id (str ) – The ID of the dataset used as input for this query.
dataset_name (str ) – The name of the dataset used as input for this query.
infodataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
version_id (str ) – The ID of the version used as input for this query.
version_name (str ) – The name of the version used as input for this query.
dangerVersion names are not unique. The query is applied to a single, the last updated version will be selected!
infoversion_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
infoIf dataset version is not specified, the last updated version in the dataset will be selected
weight (float ) – Weight of the rule. Measured relative to other queries. For example, two queries with the same weight will cause the dataview to have the exact same number of frames from each query, regardless of the number of frames in the input version. This is crucial to remove inherent bias in datasets.
roi_queries (list [ RoiQuery or dict ] ) – A list of
RoiQuery
or dictionaries with ‘label’, ‘count_range’ and ‘conf_rang’ keys. Each item in the list is a query on the frame’s ROIs. Only frames that match all queries are returned by this dataview. if roi_queries==DataView.EmptyRois , it means select only frames with no rois at all.frame_query (str ) – Lucene query on the frame (can be mixed with roi query). Example: ‘(src:directory_name) AND (meta.key:my_value) AND (width:>50)’ will match any frame with src field containing ‘directory_name’, width over 50px and meta.key contains my_value.
infoA query such as ‘meta.cat_type:”white cat”’ will match frames in which the meta.cat_type field contains “white”, “cat” or both. In order to match frames in which meta.cat_type==”white cat”, use ‘meta.cat_type.keyword:”white cat”’.
source_query (str ) – Lucene query on the frame sources (can be mixed with any other query). Notice, this is a direct Lucene query, escape any special Lucene character with . Example: ‘(sources.uri:directory_name) AND (sources.preview.uri:https*)’ Returns frames with sources uri with directory_name in the link and a preview link with https prefix
dataset_project (str ) – The project of the dataset used as input for this query.
add_queries
add_queries(queries)
Add a list of queries to the dataview.
Dataview access is lazy, only when iterator is needed we actually create the dataview.
Parameters
queries (Query ) – List (or a single) of Query objects representing new queries to add.
Return type
None
get_queries
get_queries()
Get this dataview’s queries.
Usage example: Create a new DataView based on the returned queries.
dataview = DataView(queries=source_dataview.get_queries())
Returns
A list of this dataview’s queries.
Return type
list[Query]
create_version
create_version(version_name, description=None, dataset_id=None, dataset_name=None, parent_version_ids=None, parent_version_names=None, raise_if_exists=False, auto_upload_destination=None, local_dataset_root_path=None, dataset_project=None)
Create a new version in a dataset with a specific name.
If a version by that name already exists and in draft mode (i.e. writable), return that one,
unless raise_if_exists
is True
, than raise ValueError
.
After the version is created, all the frames from the current DataView will be added to it
Parameters
version_name (str ) – The name of the new version.
description (Optional [ str ] ) – Description of the new dataset version
dataset_id (Optional [ str ] ) – The ID of the dataset to create the version in.
dataset_name (Optional [ str ] ) – The name of the dataset to create the version in.
parent_version_ids (Optional [ List [ str ] ] ) – A list of the new version parents IDs. All IDs must be existing version’s IDs in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
parent_version_names (Optional [ List [ str ] ] ) – A list of the new version parents names. All names must be existing version’s names in this dataset. Currently support only a single parent for version. This is a list for future compatibility.
raise_if_exists (bool ) – If
True
and a version by namename
already exists, raiseValueError
. IfFalse
and a version by that name already exists, return it.auto_upload_destination (Optional [ str ] ) – If specified any local file linked by a SingleFrame/FrameGroup, will be automatically uploaded to the destination storage.
local_dataset_root_path (Optional [ Union [ str , Path ] ] ) – Required if
auto_upload_destination
is provided. It should point to the common folder for all local source filesdataset_project (Optional [ str ] ) – The project of dataset to create the version in.
Return type
Returns
New
DatasetVersion
object representing the new version.
dataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
get_versions
get_versions()
Get a list of this dataview’s dataset versions.
The dataview’s versions are the versions specified in all the queries, and the versions registered as inputs to the dataview.
Return type
List
[DatasetVersion
]Returns
A list of allegroai.DatasetVersion objects, one for each version of the dataview.
get_datasets
get_datasets()
Get a list of this dataview’s dataset versions. Equivalent to get_queries()
The dataview’s versions are the versions specified in all the queries, and the versions registered as inputs to the dataview.
Return type
List
[DatasetVersion
]Returns
A list of allegroai.DatasetVersion objects, one for each version of the dataview.
add_mapping_rule
add_mapping_rule(from_labels, to_label, dataset_id=None, dataset_name=None, version_id=None, version_name=None, dataset_project=None)
Add new mapping to the dataview.
Mapping automatically converts label names to canonical names in the ROIs returned for frames while iterating over the dataview. This is used to make sure that different naming in different datasets will not produce two different classes for the same object.
Example: If one dataset has ROIs with the label ‘pedestrian’ and another has the ROIs with the label ‘person’, we can use both in a single dataview to create a person detector by adding mapping from ‘pedestrian’ to ‘person’
If this Dataview was not created from an existing dataview in the server, this function triggers the creation of such dataview.
Label mapping is performed after the frame is matched against the dataview’s queries. For that reason, the queries must be defined according to the dataset’s original labels.
Parameters
dataset_id (str ) – The ID of the dataset to apply the mapping rule to.
dataset_name (str ) – The name of the dataset to apply the mapping rule to.
infodataset_id and dataset_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
version_id (str ) – The ID of the version to apply the mapping rule to.
version_name (str ) – The name of the version to apply the mapping rule to.
infoversion_id and version_name are mutually exclusive. Setting both to non-None values will raise a UsageError exception.
infoIf dataset version is not specified, the last updated version in the dataset will be selected
from_labels (str or list [ str ] ) – Label or a list of labels to map to
to_label
. The ROI must match to all of the labels for the mapping to take place.dataset_project (str ) – The project of the dataset to apply the mapping rule to.
to_label (str ) – Label to change
from_labels
to.
Return type
()
set_labels
set_labels(label_dict, *label_dicts, force=False, labels)**
Set dataview label enumeration.
Label enumeration maps label strings to integers, for later use within the network.
Parameters
label_dict (Mapping [ str: int ] ) – Mapping from a label string to its integer representation in the network. e.g. {‘cat’: 0, ‘dog’: 1, ‘hound’: 1}.
label_dicts (Mapping [ str: int ] ) – Deprecated, use
label_dict
insteadforce (bool ) – If True, set the labels even when the task is running remotely.
labels (int ) – Deprecated, use label_dict instead
Return type
None
add_augmentation_affine
add_augmentation_affine(operations=('bypass', 'scale', 'rotate', 'shear', 'reflect-horiz', 'reflect-vert'), strength=1.0)
Add affine based augmentation instruction to the dataview.
Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)
The actual execution of the augmentation is performed by the worker,
while getting the data from the frame using ImageFrame
Parameters
operations (Sequence [ Augmentation.Affine ] ) – A sequence of affine operations. One will be picked randomly and uniformly.
strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
Returns
True
if augmentation was added to the dataview.Return type
bool
add_augmentation_pixel
add_augmentation_pixel(operations=('bypass', 'blur', 'noise', 'recolor'), strength=1.0)
Add pixel based augmentation instruction to the dataview.
Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)
The actual execution of the augmentation is performed by the worker,
while getting the data from the frame using ImageFrame
Parameters
operations (Sequence [ Augmentation.Pixel ] ) – A sequence of pixel operations. One will be picked randomly and uniformly.
strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
Returns
True
if augmentation was added to the dataview.Return type
bool
add_augmentation_custom
add_augmentation_custom(operations, strength=1.0, arguments=None)
Add custom augmentation instruction to the dataview.
Augmentations are selected by the backend, where randomness is generated and can be reproduced (both in operation selection and in additional parameters)
The actual execution of the augmentation is performed by the worker,
while getting the data from the frame using ImageFrame
Parameters
operations (Sequence ( str ) ** ) – A sequence of custom operation names. Each has to be registered with
ImageFrame.register_custom_augmentation
. One will be picked randomly and uniformly.strength (float ) – Augmentation operation strength. This is how we scale (multiply) the random [0-1] parameters passed to an augmentation action.
arguments (Mapping ( str: Mapping ( str: Any ) ) ) – Arguments for custom operations. For each operation in
operations
there may be an entry inarguments
with key that equals to the operation name. The value of the entry is a mapping from name to any JSON-able value. The entry of the selected operation is returned as-is from the server, i.e. it will not be randomly chosen. This mapping is later accessible from the augmentation code when it is executed.from allegroai import ImageFrame, DataView
from my_augmentation_library import MyCustomAugmentation, MyCoolAugmentation
ImageFrame.register_custom_augmentation('custom_aug', MyCustomAugmentation)
ImageFrame.register_custom_augmentation('cool_aug', MyCoolAugmentation)
aug_arguments = {
'custom_aug': {
'color': 'blue',
'count': 42,
},
'cool_aug': {
'vector': [1, 2, 5],
'verify: True,
},
}
dv = DataView()
dv.add_augmentation_custom(
operations=('custom_aug', 'cool_aug'),
arguments=aug_arguments,
)
Returns
True
if augmentation was added to the dataview.Return type
bool
set_iteration_parameters
set_iteration_parameters(order=None, infinite=None, maximum_number_of_frames=None, random_seed=None)
Set dataview general iteration parameters.
Parameters
order (IterationOrder ) – The order in which frames are iterated on with this
DataView
.None means not-applicable (ignored)
infinite (bool ) – if
True
, dataview infinitely returns frames (with duplicates, of course).None means not-applicable (ignored)
infoDuplicate frames would still vary in augmentations, if used, due to the random nature of augmentation operation and parameter selection.
maximum_number_of_frames (int or None ) – Limit the total number of frames the dataview returns Note: Zero or Negative values are ‘unlimited’ number of frames
None means not-applicable (ignored)
random_seed (int or None ) – Random seed for any randomness needed (e.g. order, augmentation, etc.). Default random seed is fixed for easy reproducibility.
None means not-applicable (ignored)
Returns
True
if at least one of the specified dataview iteration parameters was setReturn type
bool
get_iteration_parameters
get_iteration_parameters()
Get dataview iteration parameters (includes general and video iteration parameters)
Returns
A dictionary specifying iteration parameters
Return type
dict
set_video_parameters
set_video_parameters(minimum_time_between_consecutive_frames=0, sequence_minimum_time=0)
Set dataview video specific iteration parameters.
These settings are only relevant to video content (frames sharing the same source uri with different timestamps).
This method overrides any unset argument to default.
Parameters
minimum_time_between_consecutive_frames (int ) – When the frames contain a positive timestamp (i.e. video), make sure two consecutive frames are at least
minimum_time_between_consecutive_frames
in milliseconds apart.sequence_minimum_time (int ) – When the frame contains a positive timestamp (i.e. videos), expand the selected frame (based on the filters) with enough frames so that we end up with a sequence of at least
sequence_minimum_time
length (in timebase/milliseconds)
Returns
True
if at least one of the specified dataview iteration parameters was setReturn type
bool
set_random_seed
set_random_seed(random_seed)
Set the random seed for this dataview.
The random seed is always fixed so the entire run is reproducible. The default random seed is fixed to 1337.
Parameters
random_seed (int ) – Random seed for any randomness needed (e.g. order, augmentation, etc.).
Return type
None
get_random_seed
get_random_seed()
Get the random seed for this dataview.
The random seed is always fixed so the entire run is reproducible. The default random seed is fixed to 1337.
Return int
Random seed for any randomness needed (e.g. order, augmentation, etc.).
Return type
int
to_dict
to_dict(projection=None)
Gets the DataView as a list of dictionaries. Each dictionary in the list represents a frame. Warning: This function pulls all the frames from the backend at the same time. It is recommended to use DataView.get_iterator(projection=projection or [‘*’]) when possible, to avoid this.
Example:
dataview.to_dict(projection=['id', 'dataset.id', 'sources'])
# will return a list containing dictionaries with the following fields:
# [
# {
# 'id': '514504adbb6a91620eefa3e21ecfcc31',
# 'dataset': {
# 'id': 'df3638ec95454589bf86ba97f344f697'
# },
# 'sources': [
# {
# 'id': 'Frame',
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0,
# 'preview': {
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0
# }
# }
# ]
# },
# some other dictionaries with the same format here
# ]
Parameters
projection (Optional [ Sequence [ str ] ] ) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. To see supported fields for projection, see the schema at backend_api.services.frames.Frame
Return type
List[dict[str, str]]
Returns
A list of dictionaries representing each frame
get_iterator
get_iterator(query_cache_size=None, query_queue_depth=None, allow_repetition=False, projection=None, kwargs)**
Get an iterator for this DataView.
The iterator will yield frames from the dataview according to its
queries (see add_query
and add_multi_query
), and
its iteration parameters (see set_iteration_parameters
and
set_video_parameters
).
The yielded frames are only the frame’s metadata.
If DataView is image-based, you can wrap the SingleFrame/FrameGroup with ImageFrame
for builtin augmentation support.
Every function call will create a new iterator for the DataView!
Iterator length will return the expected number of frames based on the specific queries
If DataView infinite flag is set, len(iterator) will return 2^32
If the maximum_number_of_frames was set, len(iterator) will return maximum_number_of_frames
If allow_repetitions is True, a limit to the maximum returned frames will be calculated automatically. This limit can be thought of as an entire epoch, as it guarantees that we cover all unique frames in each rule (that said, in some rules we might have repetition as part of the rule ratio balance). len(iterator) will return this synthetic epoch limit, and the Iterator will raise StopIteration when reaching this limit (as expected by an iterator)
Parameters
query_cache_size (int ) – The requested number of metadata frames in every API call to the server. A large value is slower per request, but faster in average for frames returned by the iterator.
query_queue_depth (int ) – Number of API request results to store in the return queue. The maximum total number of metadata frames stored in the return queue is query_cache_size * query_queue_depth.
allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
projection (Optional [ Sequence [ str ] ] ) – Used to select which parts of the frame will be returned. Each string represents a field or sub-field (using dot-separated notation). In order to specify a specific array element, use array index as a field name. To specify all array elements, use ‘*’. To see supported fields for projection, see the schema at backend_api.services.frames.Frame. If this argument is set, the values the iterator returns are dictionaries representing each frame
For example:
dataview.get_iterator(projection=['id', 'dataset.id', 'sources'])
# will return an iterator that yields dictionaries with the following fields:
# {
# 'id': '514504adbb6a91620eefa3e21ecfcc31',
# 'dataset': {
# 'id': 'df3638ec95454589bf86ba97f344f697'
# },
# 'sources': [
# {
# 'id': 'Frame',
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0,
# 'preview': {
# 'uri': 'https://clearml-public.s3.amazonaws.com/datasets/food_dataset/pizza/3724187.jpg',
# 'timestamp': 0
# }
# }
# ]
# }kwargs (int ) –
Returns
An iterator over the DataView’s frames.
Return type
Generator((SingleFrame, FrameGroup, dict))
to_list
to_list(allow_repetition=False, auto_synthetic_epoch_limit=None)
Get a list of frames for this DataView.
The returned list will hold frames from the DataView according to its
queries (see add_query
and add_multi_query
), and
its iteration parameters (see set_iteration_parameters
and
set_video_parameters
).
The yielded frames are only the frame’s metadata (SingleFrame/FrameGroup).
Every function call will create a new list of frames for the DataView!
Parameters
allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
auto_synthetic_epoch_limit (
Optional
[int
]) – deprecated, use allow_repetition instead
Returns
A list of SingleFrame, FrameGroup generated from the DataView’s query.
Return type
list((SingleFrame, FrameGroup))
split_to_lists
split_to_lists(ratio, allow_repetition=False, seed=42, frame_id_fn=None)
Partition the frames represented by the DataView into non-overlapping lists (e.g. train/validation/test) according to the split weights
It is guaranteed that there is no overlap between the lists.
It is guaranteed frames partitioning is based solely on weights and seed, and is independent of the DataView query, this means partitions are consistent.
The resulting frames are uniformly split based on the frame_stringify_fn(frame) returned identifier combined with the seed number.
The yielded frames are only the frame’s metadata (SingleFrame/FrameGroup).
Every function call will create a new list of frames for the DataView!
Parameters
ratio (list ( int ) ) – List of weights (integer) for the partitions. The split is based on the ratio between the weight ot the total sum of weights. For example: a train/val/test split of 60% / 20%/ 20% is achieved with partition_weights=[3, 1, 1] and will result in three partitions, the first will have 3/5th of the DataView’s frames, the second 1/5th, and the third the last 1/5th of the frames in the DataView.
allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
seed (int ) – Random number to add to the frame identifier, controlling the randomness in the selection criteria of the frame partitioning
frame_id_fn (lambda ) – User-provided frame identifier function used for consistent partitioning of the DataView frames. Default is (lambda frame: frame.id)
Returns
A list of partitions, where each partition is a list of SingleFrame/FrameGroup generated from the DataView’s query.
Return type
list(list((SingleFrame, FrameGroup)))
to_json
to_json(json_file, allow_repetition=False, auto_synthetic_epoch_limit=None)
Store DataView data to json file. Stored DataView frames contain only the metadata.
Parameters
json_file (
Union
[str
,BytesIO
]) – Path (str) or BytesIO object, to store the DataView data to (in JSON format).allow_repetition (bool ) – The length of the iterator returned, will be limited to cover all the unique frames from all the different queries of the DataView. For example: if we have two queries, one has 100 unique frames, and the other 2 unique frames. The allow_repetition will set the DataView maximum_number_of_frames to 200 frames, and unset the infinite flag of the data view. This will cause the len(iterator) to return 200 and StopIteration will be raised after 200 frames (as expected)
auto_synthetic_epoch_limit (
Optional
[int
]) – deprecated, use allow_repetition instead
Return type
bool
Returns
True if successful
from_json_to_list
from_json_to_list(json_file)
Load DataView data from a json file. The returned frames are only the frame’s metadata and of type SingleFrame/FrameGroup.
Parameters
json_file (
Union
[str
,BytesIO
]) – Path (str) or BytesIO object, to load the DataView data from (in JSON format).Returns
A list of SingleFrame, FrameGroup generated from JSON file DataView data.
Return type
list((SingleFrame, FrameGroup))
prefetch_files
prefetch_files(num_workers=None, wait=False, query_cache_size=None, get_previews=False, get_masks=True)
Prefetch the DataView’s files (files/uri pointed by frames in the DataView). e.g. call SingleFrame.get_local_source() .get_local_preview() .get_local_mask() on all the frames in the DataView Pre-fetching is done in the background, and the function call returns immediately
Parameters
num_workers (
Optional
[int
]) – None (default), number of workers is set to cpu countwait (
bool
) – if True return after all files were prefetched to a local storagequery_cache_size (
Optional
[int
]) – The requested number of metadata frames in every API call to the server.get_previews (
Optional
[bool
]) – Prefetch preview files (default False)get_masks (
Optional
[bool
]) – Prefetch masks (default True)
Return type
bool
Returns
True if pre-fetching started
get_mapping_rules
get_mapping_rules()
Get current dataview mappings
Note: if dataview was created from existing dataview, this function triggers the creation of the dataview
Return type
List
[None
]Returns
list of mapping objects
get_labels
get_labels()
Get dataview label enumeration (i.e. dictionary of label string to id)
example: {‘person’: 1, ‘background’: 0, ‘pedestrian’: 1} Notice: every function call will create a new iterator for the dataview!
Return type
Dict
[str
,int
]Returns
dictionary of string to integer
store
store(project_name=None, name=None, description=None, tags=None)
Stores the current dataview in the system for future use
Parameters
project_name (
Optional
[str
]) – project name (string)name (
Optional
[str
]) – dataview name (string)description (
Optional
[str
]) – dataview description (strung)tags (
Optional
[List
[str
]]) – list tags (strings) for this dataview
Return type
None
connect
connect(task)
Connect current dataview with a specific task
When running in debug mode (i.e. locally), the task is updated with the dataview object When running remotely (i.e. from a daemon) the dataview is being updated from the task Notice! when running remotely the dataview is ignored and loaded from the task object regardless of the code
Parameters
task (Task ) – Task object
Return type
None
:raise ValueError exception if DataView object is already connected to other task
Return type
None
Parameters
task (Task ) –
update_task
update_task(task)
Set this dataview’s contents into a task. This action does not set a reference to the provided data view, but rather copies all data into the appropriate internal structure maintained by the task object and updates task information in the backend using the API. This function will set the dataview’s contents even when running remotely. Note that attempting to update a task that the DataView is already connected to will raise a ValueError.
Parameters
task (Task ) – The task to set the content into
Return type
None
get_id
get_id()
return the dataview ID in the system (for future use)
Return type
str
Returns
dataview ID (string)
has_id
has_id()
Return type
bool
Returns
True iff the data view is stored under an ID in the server.
get_count
get_count()
Returns the total number of unique frames returned by this dataview, as well as the number of unique frames returned from each of this dataview’s queries (rules)
Return type
Tuple
[int
,Sequence
[int
]]
get_sources
get_sources()
Returns a list of sources URI links from all the frames in the current DataView
:rtype: Sequence
[str
]
:returns: list of URI strings
Return type
Sequence[str]