Active learning for object detection and human pose estimation

Learn Dec 1, 2020
Active Learning for Object Detection 1
Fig. 1: Sorting images based on active learning uncertainty scores (“entropy” values) and assigning the images for annotation with the highest uncertainty.

Active learning algorithms help deep learning engineers select a subset of images from a large unlabeled pool of data in such a way that obtaining annotations of those images will result in a maximal increase of model accuracy. This is the 2nd article in our list of articles about active learning. In our previous article about active learning for classification models, we covered 2 algorithms for using active learning on classification models. In the current article, we will cover the results of using the “Learning Loss for Active Learning“ [1] algorithm for object detection and human pose estimation tasks, its usage in SuperAnnotate’s platform, share the code and some benchmarking data. We will also publish another article covering our results on segmentation tasks.

Outline

  • Results on object detection on “Pascal VOC” dataset with SSD model
  • Results on Human Pose estimation on Coco dataset with “Deep High-Resolution Representation Learning” model
  • Using our code
  • Concluding remarks
  • References

Results on object detection on “Pascal VOC” dataset with SSD model

We evaluated Learning Loss active learning on the object detection task on Pascal VOC dataset using the SSD model [4]. An open-source code was employed [2] and our code is available here. We were able to replicate the results presented in the paper [1], showing ~2.2% margin of mAP compared to random.

Active Learning for Object detection chart 1
Fig 2: Comparison of Learning Loss algorithm to Random selection for Object Detection task.

Results on human pose estimation on coco dataset with “deep high-resolution representation learning” model

We applied loss prediction active learning to the human pose estimation task on Coco dataset using the “Deep High-Resolution Representation Learning” algorithm [3]. An open-source implementation [5] was used and our code is available here. The model consists of 2 convolutional layers followed by 4 sub-networks, and we use the output features of all 6 parts as an input to our loss prediction model, resulting in inputs of shapes ([64, 128, 96], [64, 64, 48], [256, 64, 48], [32, 64, 48], [64, 32, 24], [32, 64, 48], [64, 32, 24], [128, 16, 12], [32, 64, 48]) (CHW format).

We trained the network for 6 cycles, starting with 5000 random images, and adding another 5000 images on each subsequent cycle. We ran using configuration file w32_256x192_adam_lr1e-3.yaml, but changed the number of epochs from 210 to 100 to reduce GPU time used for experiments.

Active Learning for Object Detection chart 3
Fig 3: Comparison of Learning Loss algorithm to Random selection for Human Pose Estimation task.

From Fig. 3 we can see that even though active learning slightly outperformed random selection, the difference of ~0.8% is pretty small. We believe it’s possible to achieve better results for human pose estimation tasks and we’ll update you on the progress later.

Using our code

Our code for all the above-mentioned algorithms and experiments can be found here. The README contains detailed instructions on how to run our code or add your active learning algorithm. It can also generate CSV files to be uploaded to SuperAnnotate’s platform.

Concluding remarks

In the current article, we covered active learning for Object Detection and Human Pose Estimation tasks. Future articles will cover semantic segmentation.

We plan to create an open-source codebase with multiple active learning algorithm implementations, all implemented in a similar way and evaluated on the same datasets, tasks and models. We hope to help our users select the right algorithm for their dataset and model based on this code and evaluation results, and easily integrate it in SuperAnnotate’s platform.

References

[1] Donggeun Yoo and In So Kweon. Learning Loss for Active Learning. arXiv:1905.03677 [cs.CV], May 2019.

[2] Object detection pytorch code using SSD model. https://github.com/amdegroot/ssd.pytorch

[3] Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong. Deep High-Resolution Representation Learning for Human Pose Estimation. CVPR, 2019.

[4] Wei Liu, et al. "SSD: Single Shot MultiBox Detector." ECCV2016.

[5] Pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. https://github.com/leoxiaobin/deep-high-resolution-net.pytorch

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.