Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou1
Matthew Brown2
Noah Snavely2
David Lowe2
UC Berkeley1, Google2
In CVPR, 2017

We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. We achieve this by simultaneously training depth and camera pose estimation networks using the task of view synthesis as the supervisory signal. The networks are thus coupled via the view synthesis objective during training, but can be applied independently at test time. Empirical evaluation on the KITTI dataset demonstrates the effectiveness of our approach (e.g. monocular depth performing comparably with supervised methods that use either ground-truth pose or depth for training.)


Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David Lowe

In CVPR 2017 (Oral)




Sample Monocular Depth Results

KITTI Sequence 1

KITTI Sequence 2

Cityscapes Sequence 1

Cityscapes Sequence 2


We thank the anonymous reviewers for their valuable comments. TZ would like to thank Shubham Tulsiani for helpful discussions, and Clement Godard for sharing the evaluation code. This work is partially funded by Intel/NSF VEC award IIS-1539099. This webpage template was borrowed from some colorful folks.