Self-Supervised Post-Disaster Building Damage Classification Informed by Geospatial Principles

Presenter: Bo Peng

Author(s): Bo Peng | Qunying Huang

Author Affiliation(s): Spatial Computing and Data Mining Lab, University of Wisconsin-Madison | Spatial Computing and Data Mining Lab, University of Wisconsin-Madison

Session: IMAGERY AND GEOSPATIAL ANALYSIS FOR HUMANITARIAN AND EMERGENCY RESPONSE

Large-scale real-time building damage assessment plays a critical role in natural disaster (e.g., hurricanes) response. Recent advancements in supervised deep learning have significantly improved remote sensing (RS) image recognition for building damage classification. However, massive human labels are required to train supervised deep learning models. Additionally, pre-trained models often generalize poorly to new testing sites for future disasters because of domain gaps between historical training and future testing data. Moreover, traditional machine learning models mainly rely on handcrafted image features (e.g., textures), and therefore are case-by-case efforts with poor generalizability. Hence, these methods are not applicable to near real-time disaster response.

Self-supervised learning (SSL) has emerged as a new solution to image representation learning without the need for human labels. Among multiple SSL frameworks, contrastive learning (CL) of image representations has received wide attention. In the CL framework, each image will be augmented differently (e.g., cropping and rotation). A pair of augmented images corresponding to the same original image will construct a positive pair whereas the pair corresponding to different original images will construct a negative pair. The CL framework is trained to extract similar representations for the positive pair of images but different representations for the negative pair of images via an image encoder (e.g., convolutional neural network). CL enables image representation learning without human labels. However, traditional CL frameworks require specialized manual data augmentations and neural network architectures.

In response, this study proposes a novel SSL framework, spatiotemporal contrastive representation learning (ST-CRL), for learning representations of building objects with pre- and post-disaster unlabeled RS images. Informed by the First Law of Geography, it is assumed that the temporally adjacent pair of RS images over the same geographic extent should have similar representations whereas geographically distant pairs should have different representations.

Building on the satellite image dataset (i.e., xBD with 0.5m spatial resolution) for building damage classification (i.e., no damage, minor damage, major damage, and destroyed), we crop image patches centered at each building with a buffer of 15 meters as building objects. We use the pre- and post-disaster temporally adjacent building object pairs as positive pairs and the geographically distant pairs as negative pairs in ST-CRL. For building damage classification, we train the logistic regression (LR) and multi-layer perceptron (MLP) classifiers on top of the self-supervised ST-CRL image representations with different numbers (ranging from 1,000 to 9,000) of human labels. For comparative analysis, we also train a completely supervised ResNet. Experimental results demonstrate the superior performance of the self-supervised ST-CRL representations for classifying worldwide building damages. With only 1,000 labels, the LR and MLP with ST-CRL representations outperform supervised methods (e.g., ResNet) by 30% in F1 score.

Main contributions include: (1) The ST-CRL model learns image representations with no human labels, contributing to real-time data processing. (2) The ST-CRL model incorporates geospatial principles for CL without the need for specialized manual data augmentations in traditional CL frameworks. (3) The ST-CRL model can be trained with new unlabeled data and thus provides strong generalizability.

March 29 @ 12:00
12:00 — 12:15 (15′)

Bo Peng