Today we are happy to introduce to you our new fully open-sourced, “zero integration” model and project management tool for machine and deep learning. We call it TRAINS.
At allegro.ai we build an end-to-end AI platform & solution for enterprise companies. Since R&D is a central part of these pipelines, TRAINS is one of our technological cornerstones we created for our solution.
TRAINS was born from engaging with hundreds of companies and realizing that a significant number of them, both big and small, were not at the phase where they were worried about robust and scalable continuous deployment. Rather, they were still in the phase of rapid experimentation and prototyping (AKA, “find me a model that works”).
In that preliminary stage, efforts are concentrated on frequently trying out new models and repositories. However, without the benefit of a unifying experimentation management tool, ongoing research operations in organizations are usually left to the discretion of each researcher.
While this allows for zero-overhead prototyping, the side effects of such non-uniformity are detrimental in the long run: Reduced collaboration, loss of work, irreproducible training, and a negative effect on overall morale. One of the CTOs we interviewed had the following story:
“…The demo presentation required tuning some of the models we developed, using onsite footage we all annotated together as a company-wide effort. There were still training sessions that had to be set up, but they would finish before the deadline. Just then, the unthinkable happened: Our deep learning research wizard had a family emergency and all but disappeared. Nobody else knew where the ‘good’ data was, how to run his scripts, and which version of the code was the one that actually worked. We were going to have to use the base model, and the demo was going to be horrible…”
Our full-blown platform fits with and really shines for organizations with more deep-learning mileage than the current average, but we still wanted to help companies that aren’t there yet to push ahead with their machine learning experiments and projects.
Naturally, such companies are really interested in tools that would boost their productivity and solve the obstacles they encounter on a daily basis. At the same time, these are small teams in prototyping stages with few resources, so minimal integration costs are a must. The ideal product to chaperone them from prototype to an alpha version would be one that allows them to work with it as if it were not there, which would mean absolutely no workflow restrictions or introduction of new APIs.
Our solution for such a need is TRAINS— An “automagical” experiment integration and model management tool. A research team in the prototyping stage can set up and store insightful entries on their on-premises TRAINS-server in a matter of minutes by adding only two lines of code (including the line “from trains import Task”).
But why did we make it open-source?
Apart from helping companies, we wanted to help our community as well. Our vision is for TRAINS to seamlessly integrate with any DL/AI toolchains and workflow. We acknowledge that deep learning R&D and operations are not well-established yet, and we want TRAINS to remain relevant as paradigms shift in the field. For these simple reasons we made TRAINS open source and “free as in speech”
TRAINS is not like any other tool in your stack.
“Zero integration” means you get to keep doing what already works for you, and with regards to tools this makes all the difference between an integration headache and added value.
It all starts when TRAINS automatically links your experiment with the training code (git commit + local diff + Python package versions), and carries on with automatic storage of Jupyter notebooks in the form of Python code. TRAINS also logs the arguments passed at execution time (or a set of hyperparameters dictionary) as well as everything printed to the console; it logs the location of stored models, and allows you to automatically upload these models to shared storage. The fun doesn’t stop here — TRAINS automatically logs everything written to TensorBoard (or Matplotlib) in the background, without interrupting your day-to-day workflow.
When the time comes to compare experiment results and model performance, TRAINS really shines.
With the ability to sort experiments according to a specific metric as well as perform in-depth comparisons, you can not only compare performance graphs, but also associate them with hyperparameter comparison, codebase comparison, and even specific Python package versions.
You get all of the above with only two lines of integration code; this is pure monkey-patching magic.
Remember, “automagical” does not stop at integration — many more aspects of machine- and deep-learning are going to get the automagical treatment. Join the community and follow us on our project page to get word on the latest and greatest as it comes out.
This blog post is based on an article that was originally published on HeartBeat. Read the original article here.
Hey Stranger.
Sorry to tell you that this post refers to an older version of ClearML (which used to be called Trains).
We haven’t updated this yet so some, commands may be different.
As always, if you need any help, feel free to join us with Slack