Active Capture: Goal Directed System Direction of Human Action

"Active Capture" is a paradigm in multimedia computing and applications that brings together capture, interaction, and processing and exists in the intersection of these three capabilities. Most current human-computer interfaces largely exclude media capture and exist at the intersection of interaction and processing. In order to incorporate media capture into an interaction without requiring signal processing that would be beyond current capabilities, the interaction must be designed to leverage context from the interaction. For example, if the system wants to take a picture of the user smiling, it can interact with the user to get them to face the camera and smile and use simple, robust parsers (such as an eye finder and mouth motion detector) to aid in the contextualized capture, interaction, and processing. From the computer vision and audition side, Active Capture applications are high context multimedia recognizers. They augment computer vision and audition parsers and recognizers with context from the interaction with the user.

 

Publications

Ana Ramírez and Marc Davis. “Active Capture and Folk Computing.” In Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004) Special Session on Folk Information Access Through Media in Taipei, Taiwan, IEEE Computer Society Press, 2004. paper presentation
The domains of folk computing applications touch on areas of interest to people around the world but are of pressing need to those in the developing world who often lack access to basic services and rights: espe-cially health care, education, nutrition, and protection of human rights. In this paper we describe how a new paradigm for media capture, called Active Capture, and toolkit support for creating applications of this type work toward supporting the development of multimedia applications and interfaces for folk computing.
Jeffrey Heer, Nathaniel S. Good, Ana Ramírez, Marc Davis, and Jennifer Mankoff. "Presiding Over Accidents: System Direction of Human Action." In Proceedings of CHI 2004, Conference on Human Factors in Computing Systems (2004) paper video
As human-computer interaction becomes more closely modeled on human-human interaction, new techniques and strategies for human-computer interaction are required. In response to the inevitable shortcomings of recognition technologies, researchers have studied mediation: interaction techniques by which users can resolve system ambiguity and error. In this paper we approach the human-computer dialogue from the other side, examining system-initiated direction and mediation of human action. We conducted contextual interviews with a variety of experts in fields involving human-human direction, including a film director, photographer, golf instructor, and 911 operator. Informed by these interviews and a review of prior work, we present strategies for directing physical human action and an associated design space for systems that perform such direction. We illustrate these concepts with excerpts from our interviews and with our implemented system for automated media capture or “Active Capture,” in which an unaided computer system uses techniques identified in our design space to act as a photographer, film director, and cinematographer.
Marc Davis, Jeffrey Heer and Ana Ramírez. "Active Capture: Automatic Direction for Automatic Movies (Demonstration Description)." In Proceedings of 11th Annual ACM International Conference on Multimedia in Berkeley, California, ACM Press, 88-89, 2003. description video
The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating video of non-actors. Active Capture leverages media production knowledge, computer vision and audition, and user interaction design to automate direction and cinematography and thus enables the automatic production of annotated, high quality, reusable media assets. The implemented system automates the process of capturing a non-actor performing two simple reusable actions (“screaming” and “turning her head to look at the camera”) and automatically integrates those shots into various commercials and movie trailers.

Example Active Capture Applications

Implemented Active Capture Applications

Kiosk Demo (video)

The Kiosk Demo is similar to a photo kiosk in the mall, but instead of taking the user’s picture, it takes a few videos of the user, and automatically creates a personalized commercial or movie trailer staring the user. There are two parts in the Kiosk Demo, the Active Capture part works with the user to capture a shot of her looking at the camera and screaming, and a shot of her turning her head to look at the camera. The second part of the Kiosk Demo uses Adaptive Media technology described in [Dav03c, DL96]. The shots of the user screaming and turning her head are automatically edited into a variety of commercials and movie trailers including a 7up commercial, an MCI commercial, and the Terminator II movie trailer.

SIMS Faces (video)

The SIMS Faces application works with the user to achieve two goals, take her picture, and record her saying her name.