How to efficiently manage work-from-home annotation service teams?
This article discusses one of the main issues when getting high-quality AI training data. Managed Crowdsourcing Services are becoming more popular because of the ongoing pandemic that forces teams to work from home. This makes it quite difficult for professional service providers to keep the same level of annotation quality and speed they used to provide when employees worked in the office. For computer vision engineers or service companies that manage such teams, we propose multiple techniques which makes the management of such teams fast and efficient without compromising annotation quality.
- AI development during COVID-19
- Crowdsourcing vs Professional Services
- Managing Professional Services During Pandemia
AI development during COVID-19
COVID-19 hit unexpectedly over the last few months, bringing substantial changes in human lifestyle, work habits and environments. Western Europe and the US have already seen the consequences of the first wave of the pandemic, which is now passing to developing countries such as Russia and Brazil. While some nations are slowly recovering from the Covid-19, a recent antibody study in Spain notes that only 5% of its population actually contracted the virus. The takeaways from this study are rather disturbing, indicating that there is still a possibility of a second and third wave of the pandemic even in countries with high death rates such as Spain.
According to many experts, the national lockdowns across the globe may continue for several months and may come and go in a repetitive manner over the next couple of years. In addition to the well-documented economic difficulties caused by the pandemic, these lockdowns have forced employees from offices to their homes, creating significant challenges in the vision-based AI automation space (e.g. autonomous driving, drone infrastructure inspections, manufacturing automation).
Vision-based AI automation industries have massive needs for training data which is typically prepared by professional annotation service companies. As a result of the pandemic and these lockdowns, employees for these service companies have been forced to work from home, making the once-popular office based managed annotation work more closely resemble a crowdsourcing business.
Crowdsourcing vs professional services
During the early days of computer vision, image annotation was a more simplistic process where crowdsourced teams were mainly asked to tag an image or to put bounding boxes on different objects. At the time Amazon Mechanical Turk became the “way to go” platform to organize your simple microtasks, where the quality of the annotation did not play a crucial role. With the increase in real-world AI applications such as autonomous driving, robotics, and retail, the annotation instructions became increasingly more and more complicated, and the quality of the annotations became the number one priority when labeling datasets. This gave rise to specialized annotation companies with dedicated annotators and management teams. While pros and cons of crowdsourcing and specialized teams are mainly related to annotation quality and turn around time, the pandemic makes it impossible to have dedicated in-house teams in regions where many annotation services are based (Eastern Europe, South East Asia, South Asia, Africa, South America).
Those companies will soon (if not already) have to adjust all their operations to ‘work from home’ mode, making management of their teams more reminiscent of the managed crowdsourcing model. Although this model previously failed to provide good results for high-quality training data, it is inevitable that management teams will be forced to implement this type of hybrid, specialized semi crowdsourced workforce.
Hence, managing professional annotation service teams from home without sacrificing the annotation quality and speed is rapidly becoming a challenging problem during the pandemic.
Managing professional services during the pandemic
One of the main reasons why Mechanical Turk did not perform well is because the software provided by Amazon lacked all the necessary tools to manage a large workforce and to cover the entire pipeline of the data training needs (i.e. fast annotation tools, crowd management tools, predictions, active learning, transfer learning, etc.)
Below, I would like to discuss all the efforts that we have made so that computer vision engineers and project managers can efficiently manage projects with 100s to 1000s of annotators and millions of images without compromising the quality of the annotations, all while working from home.
I will cover some of the key functionalities that we added in our software (SuperAnnotate) over the last few months in order to make the transition towards working from home as smooth as possible.
Editors and automation
While managing your project, especially from a distance, automation becomes a key advantage in the annotation process. Commonly, the automation in the annotation field is assumed to be fully reliant on AI-based features, leaving a large portion of processes interlinked with the tedious task of manual corrections. However, automating the repetitive manual work (such as changing the classes and attributes) within the editor often becomes the main driver of efficiency during the annotation.
Within SuperAnnotate it is possible to minimize the number of human errors by the use of a set workflow, guidelines to annotations, vector templates, automated task distribution, and QA mechanisms, which in the context of a remote workforce implementation is most valuable.
Teams and roles
Level 1: instance-based. Within each image under the QA mode, the annotator/manager can visually see all the classes and attributes directly on the image canvas. This makes it easier to hover over objects and check the distinct classes for each one.
Level 2: bulk QA. If your project requires a specialized team of QA’s apart from annotators, the system allows you to send your images to QA, after which the QA can either approve or reject them and send it back for corrections. You can also track the status of those images. This way you can divide your team into annotators and QA’s. A centralized and specialized QA process allows focusing your efforts on quality control within a relatively small and expert team.
Level 3: communication. The comment/chat feature within the platform allows you to directly place a comment on an instance. This way the annotators are constantly aware of the project requirements and updates as well as recurring errors. The entire communication happens within the platform making the communication process more efficient.
Automatic task distribution
One of the most important aspects of managing a crowdsourcing team is task distribution. Once you create a project with your images, our system will allow the annotators and QAs to request their portion of images so there is no need to assign or reassign them one-be-one or by bulk.
The next major factor in project management is data handling. In order to be able to scale your projects, we introduced 6 distinct image statuses for image sorting and organization. This way you can keep a timely score of your project progress no matter how large the task. There are 6 distinct image statuses within the platform:
- Not started: these are the images that haven’t been edited yet.
- In progress: these are the images that have been saved at least once and are currently in progress.
- Quality check: these are the images that have been sent to the QA
- Pending: these are the image that the QA has sent back to the annotator for correction round
- Completed: These are the images that the QA, the Admin, or the upgraded Annotator completed.
- Skipped: These are the images where there is nothing to annotate.
Distributing your project instructions can be a hustle especially in remote working conditions. In SuperAnnotate we suggest three solutions for this problem:
- Document Attachment: you can attach the full instruction file to your projects, which will automatically be distributed to all project contributors.
- Visual Instructions (Pin Image): you can pin the benchmark annotations, the best examples done during the course of the project or the recurring mistakes done by many users. Those images will be automatically distributed to all project contributors.
- Comment/Chat: by having a live comment/chat system directly in the editor space, you can point out the changes or the errors by being specific down to an instance level. If the mistake is systematic across many annotators, the admin can pin that image and redistribute the wrongly annotated image with his/her comments to everyone.
Another critical aspect of project management in remote work mode is the analytics system. Without having the actual overview of the project, it is hardly possible to objectively evaluate how the project is progressing. We integrated a detailed dashboard system to allow a full user and data monitoring during the entire course of the project.
- Project Status: it allows you to see the amount of completed and remaining work.
- Image Status: this section gives a full breakdown of the project by image statuses. This way you can see the ratio of completed, in progress, sent back and skipped images at all times.
- Aggregated work progress: this section shows the aggregation of all data within your project (total number of completed classes and attributes and the total number of hours spent in the scope of the project by the annotations, QA`s and Admins).
- User statistics: individual results of each user by speed, velocity, total instance and image count, time spent, and user role.
- Class Status: the total number of instances belonging to a certain class in the scope of the project.
- Attribute Status: total number of instances with a specific attribute
The user and data dashboards allow you to not only manage your project deadline and the budget but also provide sufficient information for quality control.
Having to work remotely makes your data security paramount. By not allowing certain user types to download or upload data, we implement several data security mechanisms such as.
- Upload/download restrictions: Since there are 7 different roles within the team (team creator, team admin, project admin, annotator, QA, customer, viewer), we restrict all the annotators, QAs, and project admins to upload/download data. In addition, we disabled downloading the image option from the labeling editor, so the image cannot be downloaded one by one.
- JWT Authentication method using an independent Authentication server based on AWS Cognito service, which supports multi-factor authentication and encryption of data-at-rest and in-transit.
- Amazon Cognito is HIPAA eligible and PCI DSS, SOC, ISO/IEC 27001, ISO/IEC 27017, ISO/IEC 27018, and ISO 9001 compliant.
- Anonymization of faces and car plates are provided upon request.
The new reality of a specialized workforce having to work remotely and the absence of localized teams makes the implementation of classical team and project management impossible for complex annotation tasks at scale. This, in turn, puts large stresses on your annotation project quality and timing. The speed, quality, and scalability of your image annotation projects in the new reality depend on three factors:
- Automation of the manual work
- Quality assurance and
- Data and user management
All of this is taken care of within SuperAnnotate.