In this talk I'm going to explain why standard annotation tools didn't work for us in multiple projects, demonstrate our multiple failed attempts to build a flexible annotation system, and show how we finally came up with ipyannotator - the infinitely hackable annotation framework - and why you should use it, too.
Even though much less glamorous than developing new machine learning models, the annotation process and the required tooling is often one of the most critical aspects of real world Machine Learning projects.
Many breakthroughs in Machine Learning application such as in image classification, text understanding and recommender systems belong to the class of supervised machine learning. These ML methods often require large collections of input-output pairs from which information is learned. An example for an input-output pair is an animal image together with the species label (name) provided by a human annotator.
The main challenge in creating ML datasets is the cost of acquiring annotations/targets which is much more expensive than getting the inputs. It's well known that the prediction quality of ML models critically depends on the amount of training samples for learning.
The goal of annotation can be framed as generating as much annotations as possible with sufficient quality under a given budget constraint.
We plan to release ipyannotator as open source project before the conference but the lessons learned and concepts are independent of our concrete implementation.