Learning to See by Moving

ICCV 2015


The current dominant paradigm for feature learning in computer vision relies on training neural networks for the task of object recognition using millions of hand labelled images. Is it also possible to learn features for a diverse set of visual tasks using any other form of supervision? In biology, living organisms developed the ability of visual per- ception for the purpose of moving and acting in the world. Drawing inspiration from this observation, in this work we investigate if the awareness of egomotion can be used as a supervisory signal for feature learning. As opposed to the knowledge of class labels, information about egomotion is freely available to mobile agents. We show that using the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on the tasks of scene recognition, object recognition, visual odometry and keypoint matching.


Coming soon.
Pretrained models: LSM-Models.tar


Learning to See by Moving. P. Agrawal, J. Carreira, and J. Malik. ICCV 2015. [pdf]

Supplementary Material coming soon.

When using our system, please cite this work. The bibtex entry is provided below for your convenience.

 author = {Agrawal, Pulkit and Carreira, Joao and Malik, Jitendra},
 title = {Learning to See by Moving},
 Year = {2015}}


Egomotion models were pre-trained using the relative translation and rotation vector. The translation vector was measured in the global coordinate frame instead of the local frame of the vehicle at every time step. The arXiv submission will be updated with results from the corrected model by end of 2016/early 2017. Switching to the local coordinate frame is expected to improve the quality of learnt features.