This page outlines how to replicate the activity recognition experiments in the paper Long-term Recurrent Convolutional Networks for Visual Recognition and Description.

Code: All code to train the activity recognition models is on the "lstm_video_deploy" branch of Lisa Anne Hendricks's Caffe fork. All code needed to replicate experiments can be found in "examples/LRCN_activity_recognition".

Data: The model was trained on the UCF-101 dataset . Flow was computed using [1].

Models: Single frame and LRCN models can be found here.

**NOTE** Some people have had difficulty reproducing my results by extracting their own frames. I am almost positive the issue is in the ``extract_frames.sh'' script, but have not had time to track it down yet. You can find the RGB and flow frames I extracted here.

Steps to retrain the LRCN activity recognition models:
1. Extract RGB frames: The script "extract_frames.sh" will convert UCF-101 .avi files to .jpg images. I extracted frames at 30 frames/second.
2. Compute flow frames: After downloading the code from [1], you can use "create_flow_images_LRCN.m" to compute flow frames. Example flow images for the video "YoYo_g25_c03" are here.
3. Train single frame models: Finetune the hybrid model (found here) with video frames to train a single frame model. Use "run_singleFrame_RGB.sh" and "run_singleFrame_flow.sh" to train the RGB and flow models respectively. Make sure to change the "root_folder" param in "train_test_singleFrame_RGB.prototxt" and "train_test_singleFrame_flow.prototxt" as needed. The single frame models I trained can be found here.
4. Train LRCN models: Using the single frame models as a starting point, train the LRCN models by running "run_lstm_RGB.sh" and "run_lstm_flow.sh". The data layer for the LRCN model is a python layer ("sequence_input_layer.py"). Make sure to set "WITH_PYTHON_LAYER := 1" in Makefile.config. Change the paths "flow_frames" and "RGB_frames" in "sequence_input_layer.py" as needed. The models I trained can be found here.
5. Evaluate the models: "classify.py" shows how to classify a video using the single frame and LRCN models. Make sure to adjust the pathnames "RGB_video_path" and "flow_video_path" as needed. You can also evaluate the LSTM model by running code found in "LRCN_evaluate" (added 1/12/16).

Please contact lisa_anne@berkeley.edu if you have any questions. Happy classifying!

[1] Brox, Thomas, et al. "High accuracy optical flow estimation based on a theory for warping." Computer Vision-ECCV 2004. Springer Berlin Heidelberg, 2004. 25-36.