In this tutorial, we are going to manage the CIFAR dataset with
clearml-data CLI, and then use ClearML's
class to ingest the data.
Before we can register the CIFAR dataset with
clearml-data, we need to obtain a local copy of it.
Execute this python script to download the data
The script prints the path to the downloaded data. It will be needed later on.
To create the dataset, execute the following command:
ee1c35f60f384e65bc800f42f0aca5ec is the dataset ID.
Add the files we just downloaded to the dataset:
dataset_path is the path that was printed earlier, which denotes the location of the downloaded dataset.
There's no need to specify a
dataset_id, since the
clearml-data session stores it.
close command to upload the files (it'll be uploaded to ClearML Server by default):
This command sets the dataset task's status to completed, so it will no longer be modifiable. This ensures future reproducibility.
The information about the dataset, including a list of files and their sizes, can be viewed in the WebApp, in the dataset task's ARTIFACTS tab.
Now that we have a new dataset registered, we can consume it.
The data_ingestion.py example script demonstrates using the dataset within Python code.
get_local_copy method will return a path to the cached,
downloaded dataset. Then we provide the path to Pytorch's dataset object.
The script then trains a neural network to classify images using the dataset created above.