Five Things You can do with Jupyter and ClearML – Guest blogpost

November 2, 2020

Get more out of your AI / MLIDE

Originally written by Henok YemamRepublished by author approval.

The positive effects of Artificial Intelligence (AI) in our everyday life are no longer disputable with our ever-increasing reliance on its applications. In the early days of the internet, the cost of infrastructure limited the people who can have access to it. Contrary to those days, open-source platforms have simplified the entry to AI significantly, so practically anyone with an internet and a decent laptop could leverage these tools. Even though most people who hear about AI are mostly enamored by fancy machine learning (ML) algorithms, the infrastructures that provide these algorithms the playground to flex their muscles are mostly looked down upon. This is to be expected just like people are more excited about computer games than the motherboard of a computer. I would argue that software such as PyCharm, Visual Studio Code, Jupyter Notebook and ClearML open-source are the motherboard of AI whereas algorithms are computer games.

There are several blog posts and articles dedicated to explaining the latter, so allow me to tell you five things you can do with a combination of the former- Jupyter notebook/lab and Allegro Trains.

    1. AutoLogging
      Jupyter is the archetype experimentation IDE for proof-of-concept. It allows you to import modules, execute your code and display your graphs/charts inline. However, this same freedom comes with a cost of messy notebooks and reproducibility issues. This is where Trains’ auto logging feature comes in handy. By adding two lines of code at the top cell of your notebook, your debugging simply becomes a matter of looking at your logging file on the Trains server. This saves you time and makes your code reproducible. Additionally, all of your packages and their current versions are saved neatly on the Trains server. Last but not least, your executable code is placed in a .py file so others could reproduce your work on their own machine.

 

    1. Versioning
      The need for experimentation makes versioning inevitable in ML. This article talks about ways you can get around the lack of versioning in Jupyter which I encourage you to read. However, if you don’t want to add unnecessary steps to your Jupyter code, just use the two lines of code I mentioned earlier and Trains will take care of the versioning for you. Read my article here comparing ML project management with GitHub vs Trains.

 

    1. Hyperparameter Optimization
      Machine learning’s power comes from extracting useful information from structured data. It’s simply difficult to know if an ML model will perform well on a given data without experimenting with various features, hyperparameters and algorithms. You can perform all of these in Jupyter but the chances of you running in a self-defeating circle of experiments is high due to the difficulty of producing a viable model. Thus, automating this process of selecting the best ML model can swiftly be done using Trains Hyperparameter Optimization feature. Check out this tutorial on how to use ClearML Hyperparameter Optimization.

 

    1. Live-tracking
      As I mentioned above, picking the best model may mean the simplest model that is the least GPU/CPU intensive. Although you could hack this on Jupyter by using a memory profiler and writing code that gathers the metrics of your ML/DL model, Trains could take care of that for you with a ClearML Agent. Once the worker daemons are unleashed, your only job will be to enjoy the fruits of your labor by watching the dashboard that live-tracks the performance of your model as well as network and computing resources. This makes picking the best performing model much simpler and straightforward.

 

    1. Team collaboration
      I would say this is the main reason why ClearML Free was designed by the Allegro AI team is the epitome of an interdisciplinary field. This is very evident with data engineers building data pipelines, data scientists performing experimentation, and machine learning engineers deploying models. In reality, this process is not as straightforward and requires collaboration by sharing code, data, and on-going modeling experiments. The current adopted workflow for team collaboration is Jupyter or another IDE plus GitHub. However, Trains for team collaboration is a superior alternative to this due to its out-of-box solution such as environment replication, cross-platform compatibility, data sharing, versioning, and remote control over a shared server. Trains ease of integration with platforms that have already won the hearts of the AI community also makes it an easier choice.

 

It’s my hope the above five pointers have given you the incentive to use the simple yet powerful combination of Jupyter and Allegro Trains. I also hope I will come across your exciting experimentations using these tools. I can say that it took me some time to find a balanced development stack, but since I’ve nailed it, I didn’t look back! feel free to reach out to me on LinkedIn or Twitter to chat more about these tools or collaboration.

 

Scroll to Top