Geometry-Guided Adaptation for Road Segmentation
Gong Cheng, Yue Wang, Yiming Qian, and James H. Elder
Deep fully convolutional neural networks can perform road segmentation well under favourable weather conditions and when trained and evaluated on disjoint partitions of the same dataset. However, performance drops substantially when training and test data are drawn from distinct datasets, particularly when weather, lighting or viewing geometry shift. We propose a novel domain adaptation algorithm for adapting road segmentation to shifting weather, lighting or viewing geometries. The key idea is to use the road geometry to generate surrogate ground truth in the target domain that can be used as a teaching signal for adaptation.
Five datasets are used for developing and evaluating our work. Four of them are prevailing road or semantic datasets and the fifth one is proposed by ourselves. To study the problem of domain adaptation for road segmentation, we conduct a five-fold cross-validation experiment. In each fold, four of the datasets are employed as source datasets with ground truth labels, and the fifth is employed as a hold-out target dataset without labels. We aim to train on the source datasets and test on the unlabeled target dataset’s test set. Unlabeled target training set images are only used for unsupervised adaptation.
Stage 1. Supervised Training. The availability of ground truth pixel-level labels and vanishing points for source domain images allows us to pretrain both appearance- and geometry-based road segmentation subsystems. The appearance subsystem is a segmentation network (SegNet) trained on the source domain with ground truth labels. The geometry subsystem provides the road region geometric prior and the vanishing point detection algorithm.
Stage 2. Unsupervised Adaptation. The key hypothesis of this step is that this geometry driven estimate embodies the knowledge that is to some degree independent of the expertise learned by the network, and thus can serve as a teaching signal to adapt the network to the new domain. For each target image, we generate a surrogate road prior mask using the source-domain geometric prior & its estimated vanishing point. We then finetuned the pretrained segmentation network using the target RGB images & their corresponding surrogate masks.
Stage 3. Inference. We tried two ways to do the inference: (1) Directly output the fine-tuning softmax results and then do thresholding at 0.5 and (2) Fusing the appearance output (fine-tuned segmentation probabilistic masks) with the appearance segmentation result.