Guest Post: Who Will Profit From The Revolution In Computer Vision?

April 2, 2019
This article by Dr. Amir Konigsberg was originally published on Forbes. Dr. Konisberg is Founder and President of Twiggle, and Advisory Board Member at Allegro.ai.

Self-driving vehicles, weather forecasting drones, fulfilment robots and robotic surgery are already transforming the lives of millions of people. It is deep learning computer vision (DL CV) — visual sensors coupled with the ability to make instantaneous, human-like sense out of streaming video — that make these applications possible.

Dr. Amir Konigsberg is Founder and President of Twiggle, and Board Member at Allegro.ai

One might think that acute focus on DL CV applications would be sufficient to yield the necessary breakthroughs and successful industry applications. But, surprisingly, a lack of product-development and specific operation management tools are hindering companies from rapidly delivering transformative products.

To gain the most value from their growing DL CV investments and to gain a competitive edge, companies must investigate ways to improve their development tools, methodologies and workflows.

Learning Like A Human

A toddler just met his Uncle Joe for the first time. He focused first on the facial features that distinguish Joe from all other people. Tomorrow, Uncle Joe might show up in different clothes. Next week, he might arrive wearing a hat, but the child will be able to recognize him without a problem.

Deep learning computer vision mimics the human learning process. As deep learning data scientists capture, classify, label and present to an artificial neural network (aka “deep learning model”) masses of images and videos of people’s faces like Uncle Joe’s (in different expressions, positions, etc.), the model learns to correctly pick Joe’s face out of a crowd. Data scientists continue training the model until its ability to identify Uncle Joe (and, perhaps, thousands of other people) reaches a human-or-better level of accuracy. The DL model can then be used in a camera-equipped car, smart city, drone or security system to identify people automatically.

A Challenging New Paradigm Of Product Management

Deep learning is significantly different from traditional software/IT projects that focus on software algorithms. Rapid development in the field has made it relatively easy to obtain an untrained deep learning model for certain problem sets, but like a newborn baby’s brain, the model is full of potential yet lacking in experience. To acquire the experience that will enable it to achieve human-like capabilities, the model must be exposed to hordes of data from which to learn. Therefore, most of the work of a DL CV project is not concerned with software engineering techniques but with obtaining, refining and correctly labeling masses of relevant images and using them scientifically to train the model.

Model-training is awfully tedious and time-consuming — and expensive. Data scientists and engineers enter into an unpredictable, dreary, and iterative series of training exercises, tweaking the model each time and refining the data, throwing away inferior results in favor of improved outcomes. Computationally intensive and usually tying up pricey GPUs (and sometimes lots of them), each iteration can take hours, days or even weeks.

If you are you undertaking a research project, you may have enough time to wait for the basic training of your deep learning computer vision model. If so, you can suffice with just a couple of data scientists and engineers armed with a modest quantity of expensive equipment and a spreadsheet to keep track of everything.

However, if you are creating an actual product and want to achieve a competitive time-to-market edge by producing a state-of-the-art DL CV model, you will have to scale up significantly with many more experts who will be working in parallel. You will have to keep scrupulous track of the outcomes of each training iteration across the team, calculating what works and what does not, making incremental progress, iteration by iteration, using up an awful lot of people, budget and compute time — and probably falling behind your competitors who have found a better way.

Costly In More Ways Than You Think

Getting into the deep learning computer vision race will make or break careers and companies — not necessarily in the distant future but very soon. However, the ante is steep. Companies have to commit to building a first-class team of DL CV talent, which can be especially difficult to acquire because such people are scarce, high-priced and hard to evaluate for quality. Companies also have to make a serious investment in on-premise or cloud-resident GPU clusters.

But DL CV also demands a lot more. Companies have discovered that traditional software engineering techniques, versioning tools, repositories and other teamwork-supporting infrastructure are not applicable to DL CV. They must develop an entirely new infrastructure that essentially compresses decades of development of the traditional software engineering tool chest into a couple of years or less.

This is no easy task. Big technology companies like Google or Amazon, which have made the most progress, have ended up investing years of precious time not on actual DL CV applications but on constructing the enabling technologies, services and workflows needed to create actual solutions.

In my experience founding and building deep tech companies and working with artificial intelligence development teams, I have witnessed tremendous investment in the creation and maintenance of comprehensive platforms, often at the cost of product and time-to-market momentum. If you’re not able to match Google or Amazon’s significant drawing power and endless bucks for top talent, you are going to be hard-pressed to create a best-of-breed platform.

Build Solutions, Not Infrastructure

Companies that want to enter the DL CV market don’t have to build these supporting platforms. Instead, they should examine such productivity-boosting facilities as project scalability, economic use of compute infrastructure, native support for images and video with contextual understanding, teamwork orchestration and workflow, job distribution, annotation tools for ground-truth creation and debugging, and automatic network optimization.

Progress in this market is really fast. Focusing resources, talent and attention on end solutions rather than supporting infrastructures is key for staying ahead.

Facebook
Twitter
LinkedIn