A Comparison of Stereo-Matching Cost Between a Convolutional Neural Network and Census for Satellite Images

Rongjun Qin

Xu Huang Department of Civil, Environmental and Geodetic Engineering, The Ohio State University
Rongjun Qin Department of Civil, Environmental and Geodetic Engineering, The Ohio State University
Yilong Han Department of Civil, Environmental and Geodetic Engineering, The Ohio State University
Wei Liu Department of Civil, Environmental and Geodetic Engineering, The Ohio State University

14E

Stereo dense image matching is a critical component in generating 3D point clouds for mapping. Such methods highly rely on matching costs for computing the appearance (intensity, gradient, etc.) similarity of corresponding pixels. In the last decades, the most often used matching cost metrics include intensity difference, gradients difference, normalized correlation, mutual information, Census, etc., among which Census, a binary coding of local image structures, has been proven to be one of the most effective and radiation-invariance way for cost computation on various image datasets, such as satellite, aerial, terrestrial and indoor images. However, it is generally known that the above matching costs (including Census) are sensitive to various scene factors such as illumination, texture and surface reflectance in regions such as water surfaces, snowfields. The resulting uncertainties in matching cost often significantly reduce the accuracies of the final matching point clouds.

In 2016, Zbontar et al.[1] made a pioneering work on matching cost computation by training a Convolutional Neural Network (CNN) to compare image patches. They used the street-view/indoor training image sets in KITTI/Middlebury benchmark to train CNN, to compute accurate and reliable matching cost on test sets of KITTI and Middlebury. Most state of the art stereo matching methods utilized the trained CNN to obtain high ranks in Middlebury and KITTI benchmark. However, critical analysis has not yet been done when training sets and test sets are in different imaging scenarios. For example, the training sets are indoor or street view images, while the test sets are satellite images.

To give a more widely evaluation of CNN based matching cost, we performed comparisons of CNN based matching cost and Census on different training datasets and test datasets. We firstly compared the CNN based matching cost and Census cost in problematic regions to test the reliability of CNN based cost. In such comparison, we took training sets in KITTI and test sets in Middlebury, used the trained CNN and Census to compute matching cost, and analyzed their accuracy and reliability in some problematic regions, such as fine structures, poor/repetitive textures, high directional reflectivity, disparity edges and obvious light source changes. As existing pre-trained CNN (including KITTI datasets derived CNN and Middlebury datasets derived CNN) were only applied in small-scale images, such as indoor images and street view images, we then evaluated their performance in large-scale scenarios by comparing them with Census cost, a widely used non-parametric matching cost, on satellite images with different scene content. Finally, we used the satellite images as training sets to train a new CNN, and used the new CNN to evaluate its accuracy and reliability in large-scale scenarios. The comparative results show that the CNN based cost are generally more accurate and robust than Census cost in all scenarios (indoor, street view and satellite images), even though the training sets and test sets are different.

Reference:

[1] Zbontar, J., & Lecun, Y., 2015. Computing the stereo matching cost with a convolutional neural network. In Computer Vision and Pattern Recognition.

14:00 A Comparison of Stereo-Matching Cost Between a Convolutional Neural Network and Census for Satellite Images, Rongjun Qin

January 29 @ 14:00
14:00 — 14:15 (15′)

Mineral B

Rongjun Qin

Add to Google Calendar