In short - automate everything :) From training models to data processing to deploying to production.
Basically, track everything. There is nothing that is not worth having visibility to. If you are afraid of clutter, use the archive option, and set up your own cleanup service.
- Track the code base. There is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code / workflow.
- Create per-project leaderboards based on custom columns (hyper parameters and performance accuracy), and bookmark them (full URL will always reproduce the same view & table).
- Share experiments with your colleagues and team-leaders. Invite more people to see how your project is progressing, and suggest they add metric reporting for their own. These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
In order to define a Task in ClearML we have two options
- Run the actual code with
task.initcall. This will create and auto-populate the Task in CleaML (including Git Repo / Python Packages / Command line etc.).
- Register local / remote code repository with
clearml-task. See details.
Once we have a Task in ClearML, we can clone and edit its definitions in the UI, then launch it on one of our nodes with ClearML Agent.
- Create daily / weekly cron jobs for retraining best performing models on.
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
- Once there are two or more experiments that run after another, group them together into a pipeline.
Use ClearML Data to version your data, then link it to running experiments for easy reproduction. Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder / S3 / Gs / Azure). ClearML Data supports efficient Dataset storage and caching, differentiable & compressed.
Use ClearML Agent to scale work. Install the agent machines (remote or local) and manage training workload with it.
Improve team collaboration by transparent resource monitoring, always know what is running where.