Dealing with multiple GPU’s in any setup reminds me of my attempt at learning juggling. One ball is easy, two or more and it all goes a bit wobbly. The same thing happens when you have scientists and multiple GPU’s. Everything starts out nice and easy and, as things progress, it get’s.. well. Wobbly.
As usual in ClearML, we are largely agnostic as to which path you take to solve this. The one that I have found reasonably well is by using the –gpus flag. This allows one queue to have one gpu and another queue to have the other. Handy if have a small GPU for testing but a large one for training. You can spin up the two agents on the machine by doing
clearml-agent daemon --gpus 0 --queue training
clearml-agent daemon --gpus 1 --queue testing
Of course, you can also specify multiple GPU’s with the –gpus flag. Another nice thing you can do is logically seperate a single GPU into two;
CLEARML_WORKER_NAME=gpu0A clearml-agent daemon --gpus 0 --queue training
CLEARML_WORKER_NAME=gpu0B clearml-agent daemon --gpus 0 --queue testing
Obviously, you are still constrained by the amount of VRAM on the GPU itself. Sadly, ClearML can’t change that. Yet 😉