Make Your Keyboard Great Again! – User Story

August 9, 2020

How I made my keyboard a notification hub

This is a really cool project using Allegro Trains we stumbled upon and thought it would be cool to share!
Republished here with the author’s permission – original post on Medium here.

Authored by Bob Lagos

We are all familiar with this scenario, you work on your training code, fix “all” of the bugs (the ones you know about), wait for a few iterations, see that batch size wasn’t wrong and nothing blows up, and then you happily go home.

However, when you come back into the office the next day look at your loss and test accuracy you’re horrified to find that the experiment crashed on the first test cycle because you pointed your test set in the wrong folder 🙁

All those precious GPU hours have gone down the drain and you have to start training from scratch. Well, there comes a moment in every data scientist’s career where she says, NO MORE! And so, I’ve written a small service that can run in the background, monitoring my experiments’ status. Once it detects an experiment has failed it’ll light up my keyboard, even when I’m home, to let me know “Houston…we have a problem!”

I’ve used this cool open-source package called Allegro Trains, if you aren’t using it yet, go check it out! It can do more than just save you some grief 🙂 This work was inspired by this awesome post about making your keyboard a notification center. Also a shoutout to this great Havit configuration tool, that actually did the heavy lifting in reverse-engineering the USB protocol and creating the tool we use to communicate with the keyboard.

Where do you start?

You obviously start by cloning the repo. You’ll need to have a project in Allegro Trains, with a few experiments in it. The service monitors this project (or your entire Trains environment) to see if any experiment failed. Once an experiment fails it will trigger the keyboard light pattern.

To start, make sure you have Trains configured on the machine that will run the service. Then, you need to check keyboard compatibility, run the command lsusb | grep 04d9 and make sure you have a supported keyboard, the output should look something like this:

 

This demo code was tested on model Havit HV-390L but should work on other compatible keyboards like Sharkoon. Lastly, you need to compile the hv-390l-control utility, check out the repo for exact instructions on how to do so.

Monitor at the palm of your hand (literally)

tl;dr — if you don’t care how this service works, skip to the end to check how to execute it. The monitoring service can be found in the monitor_project.py file.

Input parameters:

parser = argparse.ArgumentParser(description=’TRAINS monitor failing experiments’)
parser.add_argument(‘ — project’, type=str, default=’all’,
help=’The name of the project to monitor,use ’all’ for all projects’)
parser.add_argument(‘ — refresh-rate’, type=float, default=10.,
help=’Set refresh rate of the monitoring service, default every 10.0 sec’)
parser.add_argument(‘ — kb-effect’, type=int, default=2, help=’Set alert keyboard lighting effect type’)
parser.add_argument(‘ — kb-color’, type=str, default=’#ff00c8',
help=’Set alert LED color, use CSS RGB color scheme. ‘
‘Example: #000000 is black, Default: #ff00c8 magenta rocks!’)
You can define what project to monitor (or leave blank if you want to monitor your entire environment), the project status sampling rate (in seconds), and if you want to choose effect and alert color (in 3 byte hex format, for instance #ff00c8 is magenta).

Then go to the main service loop, the main loop is divided into 3 parts:

First, it fetches all the tasks that meet a specific filter, in this case experiments (tasks) that failed after the previous check (specific UTC timestamp). Notice that it ignores archived tasks and tasks that are running “manually”, i.e. without trains-agent. If you want to include manual tasks (i.e. in development), you should just remove the “-development” from the system_tag list:

failed_tasks = Task.get_tasks(
project_name=args.project
if args.project.lower() != ‘all’ else None,
task_filter={
‘status’: [‘failed’], ‘order_by’: [‘-last_update’],
‘page_size’: 1, ‘page’: 0,
‘status_changed’: [‘>{}’.format(
datetime.utcfromtimestamp(previous_timestamp)), ],
‘system_tags’: [‘-archived’, ‘-development’]})

Second, is the code that deals with what happens when failed tasks are detected:

if failed_tasks:
print(‘Experiment id={} failed, ‘
‘raising alert’.format(failed_tasks[0].id))
# we have to first set the color in static effect (mode==1)
# only then we can change to the requested effect mode
os.system(‘{kbcmd} {effect} {red} {green} {blue}’.format(
kbcmd=os.path.join(os.path.dirname(__file__),
‘hv-kb390l-control’, ‘hv-kb390l-control’),
effect=1, red=red, blue=blue, green=green))
os.system(‘{kbcmd} {effect} {red} {green} {blue}’.format(
kbcmd=os.path.join(os.path.dirname(__file__),
‘hv-kb390l-control’, ‘hv-kb390l-control’),
effect=args.kb_effect, red=red, blue=blue, green=green))
print(‘’)

It makes a system call for the compiled hc-kb390l-control tool to flash in the chosen colors.

This is the customizable part, you can basically change the code to do anything you want to do!

Then it just waits for the next sample point

sleep(args.refresh_rate)

Simple, yet powerful 🙂

Running the service

To run the service, all you need to do is run:

python monitor_project.py — project “my important project”

 

No more grief over lost GPU time! With this mini service you’ll be notified when something in your experiment went wrong!

Hey Stranger.
Sorry to tell you that this post refers to an older version of ClearML (which used to be called Trains).

We haven’t updated this yet so some, commands may be different.
As always, if you need any help, feel free to join us with Slack

Facebook
Twitter
LinkedIn