R-CNNs for Pose Estimation and Action Detection

March, 2015


We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-theart results for keypoint and action prediction. Additionally, we introduce a new dataset for action detection, the task of simultaneously localizing people and classifying their actions, and present results using our approach.


Deploy Code

Before using the available source code, you need to install Caffe.

Action Classification
Download: Action.tar.gz
Keypoint Prediction
Download: Pose.tar.gz
Person Detection+Action+Pose
Download: All.tar.gz


You can download the action dataset, as used in the paper. The dataset contains the PASCAL VOC Action 2012 images, with complete annotations of all the people and their action labels.

Dataset download: action_dataset.tar.gz

How to cite

When citing our system, please cite this work. The bibtex entry is provided below for your convenience.

 Author = {G. Gkioxari and B. Hariharan and R. Girshick and J. Malik},
 Title = {R-CNNs for Pose Estimation and Action Detection},
 ArchivePrefix = {arXiv},
 Eprint = {1406.5212},
 PrimaryClass = {cs.CV},
 Year = {2014}}


For any questions regarding the work or the implementation, contact the author at gkioxari@eecs.berkeley.edu