Active Capture

``Active Capture'' is a paradigm in multimedia computing and applications that brings together capture, interaction, and processing and exists in the intersection of these three capabilities. Most current human-computer interfaces largely exclude media capture and exist at the intersection of interaction and processing. In order to incorporate media capture into an interaction without requiring signal processing that would be beyond current capabilities, the interaction must be designed to leverage context from the interaction. For example, if the system wants to take a picture of the user smiling, it can interact with the user to get them to face the camera and smile and use simple, robust parsers (such as an eye finder and mouth motion detector) to aid in the contextualized capture, interaction, and processing. From the computer vision and audition side, Active Capture applications are high context multimedia recognizers. They augment computer vision and audition parsers and recognizers with context from the interaction with the user.

Publications

Ana Ramírez and Marc Davis. “Active Capture and Folk Computing.” In Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004) Special Session on Folk Information Access Through Media in Taipei, Taiwan, IEEE Computer Society Press, 2004. paper presentation
	The domains of folk computing applications touch on areas of interest to people around the world but are of pressing need to those in the developing world who often lack access to basic services and rights: espe-cially health care, education, nutrition, and protection of human rights. In this paper we describe how a new paradigm for media capture, called Active Capture [3-5], and toolkit support for creating applications of this type work toward supporting the development of multimedia applications and interfaces for folk computing.

Jeffrey Heer, Nathaniel S. Good, Ana Ramírez, Marc Davis, and Jennifer Mankoff. "Presiding Over Accidents: System Direction of Human Action." In Proceedings of CHI 2004, Conference on Human Factors in Computing Systems (2004) paper video
	As human-computer interaction becomes more closely modeled on human-human interaction, new techniques and strategies for human-computer interaction are required. In response to the inevitable shortcomings of recognition technologies, researchers have studied mediation: interaction techniques by which users can resolve system ambiguity and error. In this paper we approach the human-computer dialogue from the other side, examining system-initiated direction and mediation of human action. We conducted contextual interviews with a variety of experts in fields involving human-human direction, including a film director, photographer, golf instructor, and 911 operator. Informed by these interviews and a review of prior work, we present strategies for directing physical human action and an associated design space for systems that perform such direction. We illustrate these concepts with excerpts from our interviews and with our implemented system for automated media capture or “Active Capture,” in which an unaided computer system uses techniques identified in our design space to act as a photographer, film director, and cinematographer.

Marc Davis, Jeffrey Heer and Ana Ramírez. "Active Capture: Automatic Direction for Automatic Movies (Demonstration Description)." In Proceedings of 11th Annual ACM International Conference on Multimedia in Berkeley, California, ACM Press, 88-89, 2003. description video
	The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating video of non-actors. Active Capture leverages media production knowledge, computer vision and audition, and user interaction design to automate direction and cinematography and thus enables the automatic production of annotated, high quality, reusable media assets. The implemented system automates the process of capturing a non-actor performing two simple reusable actions (“screaming” and “turning her head to look at the camera”) and automatically integrates those shots into various commercials and movie trailers.