The dataset_creation.py script demonstrates how to do the following:
- Create a dataset and add files to it
- Upload the dataset to the ClearML Server
- Finalize the dataset
We first need to obtain a local copy of the CIFAR dataset.
This script downloads the data and
dataset_path contains the path to the downloaded data.
This creates a data processing task called
cifar_dataset in the
dataset examples project, which
can be viewed in the WebApp.
This adds the downloaded files to the current dataset.
This uploads the dataset to the ClearML Server by default. The dataset's destination can be changed by specifying the
target storage with the
output_url parameter of the
finalize command to close the dataset and set that dataset's tasks
status to completed. The dataset can only be finalized if it doesn't have any pending uploads.
After a dataset has been closed, it can no longer be modified. This ensures future reproducibility.
The information about the dataset, including a list of files and their sizes, can be viewed in the WebApp, in the dataset task's ARTIFACTS tab.
Now that we have a new dataset registered, we can consume it!
The data_ingestion.py script demonstrates data ingestion using the dataset created in the first script.
The script above gets the dataset and uses the
method to return a path to the cached, read-only local dataset.
If you need a modifiable copy of the dataset, use the following code:
The script then creates a neural network to train a model to classify images from the dataset that was created above.