Join our upcoming webinar “Deriving Business Value from LLMs and RAGs.”
Register now

In this article, we would like to present our new building pre-annotation algorithms for aerial imagery, share its code and the motivation behind integrating it into our platform.

u-net based building footprint pre-annotation
Fig. 1: Sample pre-annotation.

Motivation: Why did we create a building pre-annotation algorithm

Aerial images annotation is tedious work, and annotating hundreds of thousands of buildings takes a lot of effort and funds. Here at SuperAnnotate, we strive to use state-of-the-art computer vision technology to automate and accelerate the creation of pixel-perfect annotations. As a part of that effort, several smart pre-annotation algorithms were integrated into the platform, allowing our users to fix the auto-generated annotations, instead of starting from scratch. That in turn enables users to generate annotations of the same quality with less effort.

The algorithm and code description

Our algorithm is based on the winning solution of Spacenet Building Detection. SpaceNet is a corpus of commercial satellite imagery and labeled training data to use for machine learning research. They host building and road detection challenges and open-source the best solutions.

The winner of the second building detection challenge uses a segmentation algorithm called U-Net, then cuts the segmentation mask into building footprints. U-Net architecture is shown in Figure 2. It can be summarized as an encoder-decoder network, with skip connections between the corresponding layers of encoder and decoder. U-Net is fast to train and has good performance even on relatively small datasets. The winner of the Spacenet challenge used only 4 layers instead of 5 by removing the last layer with 1024 channels. We verified that adding the layer back does not result in any improvement.

We merged the datasets of Vegas, Paris, and Shanghai and trained a single network on the whole data. We did not use Khartoum city annotations due to lower quality of annotations. We also added some augmentation, which helped the network adapt to images from cities not present in the training data. Our PyTorch code is open-sourced here with all the necessary instructions. We were able to achieve an IoU of 0.545 on the test set.

U-Net based building footprint pre-annotation 3
Fig. 3. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the different operations.

Future roadmap

Currently, we have a model that works fairly well on most city images we have. Yet we believe that our model will benefit from adding more data from different cities. We also plan to implement a road detection algorithm to assist our users in road annotations. Stay tuned for more updates on our progress regarding building and road pre-annotations.

Recommended for you

Stay connected

Subscribe to receive new blog posts and latest discoveries in the industry from SuperAnnotate