LS3D: Single-View Gestalt 3D Surface Reconstruction from Manhattan Line Segments

This project page provides code for the LS3D algorithm and Dataset. If you use either, please cite:
Qian Y., Ramalingam S., Elder J.H. (2018) LS3D: Single-View Gestalt 3D Surface Reconstruction from Manhattan Line Segments. Proceedings of the Asian Conference on Computer Vision (ACCV), LNCS 11364, 399-416.
For enquiries, please contact Yiming Qian (yimingqian88@gmail.com) or James Elder (jelder@yorku.ca).

The LS3D Algorithm

Recent deep learning algorithms for single-view 3D reconstruction recover rough 3D layout but fail to capture the crisp linear structures that grace our urban landscape. Here we show that for the particular problem of 3D Manhattan building reconstruction, the explicit application of linear perspective and Manhattan constraints within a classical constructive perceptual organization framework allows accurate and meaningful reconstructions to be computed. The proposed Line-Segment-to-3D (LS3D) algorithm computes a hierarchical representation through repeated application of the Gestalt principle of proximity. Edges are first organized into line segments, and the subset that conforms to a Manhattan frame is extracted. Optimal bipartite grouping of orthogonal line segments by proximity minimizes the total gap and generates a set of Manhattan spanning trees, each of which is then lifted to 3D. For each 3D Manhattan tree we identify the complete set of 3D 3-junctions and 3-paths, and show that each defines a unique minimal spanning cuboid. The cuboids generated by each Manhattan tree together define a solid model and the visible surface for that tree. The relative depths of these solid models are determined by an L1 minimization that is again rooted in a principle of proximity in both depth and image dimensions. The method has relatively fewer parameters and requires no training. For quantitative evaluation, we introduce a new 3D Manhattan Building Dataset (3DBM). We find that the proposed LS3D method generates 3D reconstructions that are both qualitatively and quantitatively superior to reconstructions produced by state-of-the-art deep learning approaches.

3D Manhattan Building Dataset (3DBM)

  • 118 images of 57 urban buildings were captured with a Sony NEX-6 camera at 4912 x 3264 pixel resolution
  • 3D massing models for the buildings were obtained through the City of Toronto Open Data project (www.toronto.ca/city-government/data- research-maps/open-data)
  • Point correspondences between the images and the models were identified manually and used to solve for the pose of the camera relative to the building.
  • The 3DBM dataset can be downloaded here.

Evaluation

The LS3D method was compared to state-of-the-art deep single-view 3D algorithms by rendering both the ground truth 3DBM models and the 3D CAD models produced by LS3D to the image plane as range maps and then computing the squared deviation over pixels where both return a range estimate.  

MethodsRMSE (m)RMSPE (%)
Make3D25.463.3
Eigen11.831.2
FCRN (Make3D)14.235.0
FCRN (NYU)11.027.8
DORN11.628.9
planeNet9.3523.4
LS3D (no occlusion constraint)8.0119.9
LS3D (with occlusion constraint)7.0517.7

In a second evaluation, we perform surface completion of the LS3D estimates by Gaussian diffusion to evaluate over all pixels defined by the ground truth 3DBM models.

MethodsRMSE (m)RMSPE (%)
Make3D27.165.8
Eigen13.334.2
FCRN (Make3D)16.038.4
FCRN (NYU)12.430.5
DORN13.131.8
planeNet10.626.1
LS3D (with occlusion constraint)8.1719.4

Qualitative Results

3D Models Generated by LS3D

Conclusion

LS3D algorithm uses geometry-driven techniques with no appearance cues or learning methods, yet outperforms state-of-the-art deep learning methods on the problem of single-view 3D Manhattan building reconstruction.