Annotation: Expectations vs. Reality

January 15, 2019

There is a lot of media hype over the use of deep learning in computer vision (and rightly so – the potential is huge). However, the reality on the ground is that the overwhelming majority of companies are still at the very early stages of adopting this new technology. As is the case with new technologies, some industries have been quicker to get on the adoption curve than others.

One of the bigger challenges of building computer vision products is not the AI or algorithms, but rather annotation and labelling.

Annotating or labelling images and videos correctly is essential to properly train a neural network. Every data scientist knows that having “ground truth” datasets is key. Creating high quality ground truth is a burdensome, labor-intensive job. The annotators often must be familiar with the vision domain they are tasked with. For example, the roads and road signs in Australia are very different from those in France. Defining what and how to annotate is also detail oriented and error prone. Is a partially occluded person considered a person? Is part of an arm considered an arm? Moreover, even in the same domain, different tasks or use-cases for which the ground-truth data set is created may require different annotation specifics.

Follow the Money

The economics of an annotation project are also very tricky. It is difficult to determine the perfect mix of which images or frames and how many are required to be annotated. In order to maximize return on investment in an annotation project, this mix must be optimized.  

Training a neural network from its initial architecture to an end model, or detector, is an iterative process. Another challenge therefore, lies in the fact that it is hard to readily observe any positive or negative effect your annotations had on improving your model.

Keep it to Yourself

Getting set up with ground-truth data sets through the current best-practice manual annotation processes itself presents IP security challenges, because so many of the projects are being handled by third parties and involves manual labor by many individuals. Data represents one of your key proprietary assets as well as your intellectual property. In the world of deep learning and training of neural networks, this means you should ideally retain total control over your data at all times. This total  control must be applied regardless of which data hosting service you may use or where the data resides.

Deep learning, like many technology-driven scientific processes, can benefit from a sophisticated combination of automation and manual intelligence work.  Allegro’s automated annotation solution integrates annotation workflows into the core training pipeline and automates ground-truth creation as well as quality control of the vast majority of your data sets. Allegro brings visibility into the marginal returns of each new training iteration. This visibility enables you to understand when they have levelled off, in order to halt the annotation and training process; hence optimizing your spending and ROI.