Vision-based AI automation industries have massive needs for training data, which is typically prepared by professional annotation service providers or internal annotation teams.
Managing such teams has become more and more challenging due to the ongoing pandemic that is forcing teams to work from home. As a result, annotation teams are finding it difficult to maintain the same quality and delivery speed as they did in the office.
In this article, we propose team management techniques for computer vision engineers and service companies to deliver and ensure fast and high-quality annotations.
Annotation teams across the world are forced to work from home. Image source: NYT
- Crowdsourcing vs professional services
- Managing professional services during a pandemic
Crowdsourcing vs Professional Services
During the early days of computer vision, image annotation was a simple process where crowdsourced teams were mainly asked to tag an image or put bounding boxes on different objects. At the time, Amazon Mechanical Turk became the go-to platform to organize simple microtasks, where the annotation quality did not play a crucial role.
The world has seen an explosion of AI applications such as autonomous driving, robotics, aerial imaging, and retail automation. As a result, annotation instructions have become more complicated, and annotation quality has become the number one priority when labeling datasets. This has given rise to specialized annotation companies with dedicated annotators and management teams.
On the other hand, many companies prefer in-house annotations for reasons such as accuracy and privacy, even if it means investing in such a workforce.
Regardless of the setup of an annotation team, whether in-house or part of a specialized annotation service provider, the pandemic makes it very difficult to have dedicated office-based teams in regions where most of the annotation teams are located. These regions are Eastern Europe, South East Asia, South Asia, Africa, and South America.
Teams located in these regions have to adjust all their operations to “work from home” mode, making team management resemble the crowdsourcing model.
Crowdsourcing previously failed to provide good results for high-quality training data. One of the main reasons why crowdsourcing models like Mechanical Turk do not perform well is because the software provided by Amazon lacks the necessary features to manage such a workforce. The tools also do not cover the entire pipeline of the data training needs, (i.e., fast annotation tools, crowd management tools, predictions, active learning, transfer learning, integration with CV pipelines, and more.etc.)
However, management teams will be forced to adopt a hybrid, specialized semi-crowdsourced workforce.
Hence, managing professional annotation teams from home without sacrificing the annotation quality and speed is rapidly becoming a problem during the pandemic, and annotation teams will need to adopt new solutions to adapt to the change.
Managing Annotation Teams Remotely
Below, I would like to discuss all the efforts that we have made so that computer vision engineers and project managers can efficiently manage projects with hundreds and thousands of annotators and millions of images without compromising the quality of the annotations, all while working remotely.
I will cover some of the key functionalities of our software (SuperAnnotate) that make managing teams, whether working from the office or remotely smooth and efficient.
Editors and Automation
Automation is an essential advantage in the annotation process and becomes especially valuable when managing a project remotely. The automation in the annotation field is assumed to be completely reliant on AI-based features, leaving a large portion of the processes interlinked with the tedious task of manual corrections. However, automating repetitive manual work, such as changing the classes and attributes within the editor, often becomes a crucial driver of efficiency during the annotation process.
Automatic class and attribute changes for face keypoint annotation
Within SuperAnnotate, it is possible to minimize the number of human errors with the use of a set workflow, annotation guidelines, vector templates, automated task distribution, and QA mechanisms, which are very valuable for remote teams.
Teams and Roles
SuperAnnotate has 3 levels of Quality Assurance
- Level 1: instance-based. Within each image under the QA mode, the annotator/manager can visually see all the classes and attributes directly on the image canvas. This makes it easier to hover over objects and check the distinct classes for each object.
- Level 2: bulk QA. If your project requires a specialized team of QAs apart from annotators, the system allows you to send your images to QA, after which the QA can either approve or reject them and send it back for corrections. You can also track the status of those images. This way, you can divide your team into annotators and QAs. A centralized and specialized QA process allows focusing your efforts on quality control within a relatively small group of experts.
- Level 3: communication. The comment/chat feature within the platform allows you to place a comment on an instance directly. This way, the annotators are always aware of the project requirements and updates as well as recurring errors. The entire communication happens within the platform, making the communication process more efficient.
Automatic Task Distribution
One of the most important aspects of managing a crowdsourcing team is task distribution. Once you create a project with your images, our system will allow the annotators and QAs to request their share of images so there is no need to assign or reassign them one-be-one or by bulk.
The next major component of project management is data handling. In order to be able to scale your projects, we introduced 6 distinct image statuses for image sorting and organization. This way, you can keep a timely score of your project progress no matter how large the task. The statuses are:
- Not started: These are the images that haven’t been edited yet.
- In progress: These are the images that have been saved at least once and are currently in progress.
- Quality check: These are the images that have been sent to the QA.
- Pending: These are the images that the QA has sent back to the annotator for correction.
- Completed: These are the images that the QA, the Admin, or the upgraded Annotator completed.
- Skipped: These are the images that do not need to be annotated.
Distributing your project instructions can be a hustle, especially in remote working conditions. At SuperAnnotate, we suggest three solutions to this problem:
- Document Attachment: You can attach an instruction file to your projects, which will automatically be distributed to all project contributors.
- Visual Instructions (Pin Image): You can pin the benchmark annotations, the best examples done during the course of the project, or the recurring mistakes done by many users. Those images will be automatically distributed to all project contributors.
- Comment/Chat: By having a live comment/chat system in the editor space, you can point out the changes or the errors by being specific down to an instance level. If the mistake is systematic across many annotators, the admin can pin that image and redistribute the image that has been annotated incorrectly with their comments to everyone.
Another critical aspect of project management in remote working conditions is the analytics system. Without having the actual overview of the project, it is hardly possible to objectively evaluate how the project is progressing. We integrated a detailed dashboard system to allow complete user and data monitoring during the entire course of the project.
- Project Status: the amount of completed and remaining work.
- Image Status: This section gives a full breakdown of the project by image statuses. You can see the ratio of completed, in progress, sent back, and skipped images at all times.
- Aggregated work progress: This section shows the aggregation of all data within your project, such as the total number of completed classes and attributes and the total number of hours spent in the scope of the project by the annotations, QA`s and Admins.
- User statistics: individual results of each user by speed, velocity, total instance and image count, time spent, and user role.
- Class Status: the total number of instances belonging to a particular class in the scope of the project.
- Attribute Status: total number of instances with a specific attribute.
The user and data dashboards allow you not only to deliver your project on time within your budget but also to provide sufficient information for quality control.
Working remotely puts your data security at risk. By not allowing certain user roles to download or upload data, we implement several data security mechanisms such as:
- Upload/Download restrictions: Since there are 7 different roles within the team (team creator, team admin, project admin, annotator, QA, customer, viewer), we restrict all the annotators, QAs, and project admins to upload or download data. Also, we disabled downloading images from the labeling editor, so you cannot download images one by one.
- JWT Authentication method using an independent authentication server based on AWS Cognito service, which supports multi-factor authentication and encryption of data-at-rest and in-transit.
- Amazon Cognito is HIPAA eligible and PCI DSS, SOC, ISO/IEC 27001, ISO/IEC 27017, ISO/IEC 27018, and ISO 9001 compliant.
- Anonymization of faces and car plates are provided upon request.
The new reality of a specialized remote workforce, the absence of localized teams, and the exploding demand for higher-quality annotations make the implementation of a classic team and project management impossible for complex annotation tasks at scale. This puts a lot of stress on the quality and delivery time of your annotation project. The speed, quality, and scalability of your annotation projects in the new reality depend on the following factors:
- Automation of the manual work
- Quality assurance
- Data and user management
- Integration into CV pipelines