CSCW Final Report: Avatar Gesturing Experiment

Francesca Barrientos
April 3, 1998


Abstract

 We performed an experiment to determine whether a one jointed avatar arm or a two jointed avatar arm is better for performing emblematic gestures. As part of our experiment, we built two avatars. The movement of the avatar arms could be countrolled via a graphical user interface. We asked participants to both perform emblematic gestures on the avatars, and to decode emblematic gestures that were performed on the gestures. Then we asked the participants their subjective feelings about how effective the gestures were in communicating desired messages. We found that the responses did not vary depending on the avatar design. We also consider what aspects of avatar design may have changed the outcome of the experiment.


Introduction

Avatars provide virtual bodies for visitors to virtual worlds and graphical chat rooms. Presumably, virtual worlds were built so that chat type communications over the internet could be more like chatting in real life. When we meet with people face to face, we are constantly using our bodies to communicate with other. In fact, communicating using our bodies is so integral to communication that we hardly give it a thought. We stand near people that we are talking to. In the physical world, there are many reasons for doing this: you can hear people better when you face them; you watch people you talk to in order to determine if they are paying attention to you; you find clues to their emotional or physical state if you notice them blushing or blanching. Certainly, not all of these non-verbal cues can be transmitted through the medium of computer graphics, yet we have the sense that something about the way we use our real bodies can get transmitted via virtual bodies.

A ubiquitous forms of avatar body nonverbal communication is proxemics, or the use of distance. When several virtual visitors are engaged in conversation they will place their avatars near to each other. Moving an avatar closer to another avatar indicates intent begin an interaction. Using distance is one of the easiest ways to apply real world interaction behaviors to avatar body interactions. We say that this form of interaction is easy because 3D navigation techniques---which are the techniques use to move avatars around a virtual world----are fairly well understood and standardized. (Here, we refer only to navigation in a small space in a virtual world, and not navigation around virtual worlds in general.)

Another type of non-verbal communication which seems like it could be performed avatars is gesture.. Gesture, or gesticulation, in general refers to body movements which accompany articulated verbal communication. The function of gesture is wide ranging. It can reinforce, contradict, complement, replace, augment and regulate verbal interaction.

In this work, we are interested in how avatars can be used as communications interface devices, and in particular in how gesture can be performed on avatar bodies. To be begin to understand how to use avatars in communication, we have built a system which allows a user to control the arm movements of a simple avatar.

Project objectives

One factor that we feel will affect the effectiveness of avatar gesture for transmitting communication is the kinematic design of the avatar. Kinematic design refers to how the avatar moves, where it is hinged together and how those hinges work. Ideally, one would have an avatar that could exactly mimic the movements of the user, but controlling such an avatar with anything less than a fully instrumented body suit would be a user interface nightmare. Consider the fact that to exactly specify just the joint angles of the fingers and thumb on one hand requires specifying fourteen values. However, we are unsure how well humans can communicate using their avatar bodies with far simpler kinematic design than their actual bodies.

The objective of this project is to study how avatar design affects the ability of users to communicate using gesture. The gestures that we will look at are what are often referred to as emblems in the psychological literature[1]. Emblems are movements which usually have a direct and well known verbal translation. For instance, waving as a form of greeting is a kind of emblem. Emblems can often be used to replace words in a conversation. We choose emblems because they are a kind of gesture that is used intentionally and consciously by the person performing the gesture, and are usually easily decoded by the person seeing a gesture. In our experiment we try to evaluate how readily human users are able to design gestures that can be performed by the avatar and how easily those gestures can be decoded by an observer.

Experiment design

Hypothesis

Our experimental hypothesis is that avatar kinematic design will affect the ease and effectiveness with which users generate and decode emblematic gestures.

Avatar design

We designed and built two controllable avatars for this project. The avatars are modeled as three dimensional objects with mass properties. A user controls the motion of the avatar's left arm using the mouse. As the mouse is moved inside of a GUI widget shaped like a circle, the user can see the avatar's arm moving. What the user actually sees is a rendering of the avatar in a graphics window. The avatar is facing the user, that is, facing out of the computer screen, in the window.

The first avatar has only one arm joint, the shoulder joint. As the user moves the mouse around, the arm will move in a full circle in the plane defined by the body and shoulders of the avatar.

The second avatar has two joints, one at the shoulder and one at the elbow. Unlike the single-jointed avatar where the position of the mouse defines the angle of the arm, the two jointed avatar is controlled by specifying the position of the endpoint (or hand) of the avatar arm. As with the first avatar, the arm can only move around in the plane defined by the body and the shoulders. However, the area of the plane that the arm can sweep out is limited to a half circle. In other words, the arm cannot move towards the body past an imaginary vertical line passing through the shoulder joint. This restriction is a consequence of the particular controller that was implemented for this project and not a design decision meant to facilitate control of the avatar.

The actual animation of the arm is carried out by a physical simulator, Impulse, [2]built by Brian Mirtich as part of his Ph.D. work at UC Berkeley. As the user specifies the trajectory for the arm movement, the trajectory is passed on to the simulator which manages the dynamic state of the avatar. A behavior was written for each avatar that takes trajectory input from interface and then applies torques to the joints to move them along the desired trajectory.

Tasks overview

Our experiment was divided into two tasks, one to measure how effectively users felt the avatar gestures were in communicating intended messages, and the other to measure how well messages were communicated to an avatar observer. Because most people are not familiar with the concept of avatars as that term is used by the virtual reality community, we gave all participants some background information (in written form) on avatars. Our background material also suggested possible uses for avatars: "One day avatars might be used as part of the user interface for a telecommunications device."

Twenty-two people participated in the experiment. Eleven people were asked to invent gestures for the avatars. Then eleven people were asked to look at gestures that had been invented. The first set of people were drawn from the graduate students of the computer science department. The second set of people were predominantly non-computer scientists. None of the participants had any experience using avatars in chat room environments. Participants were mainly in their late twenties and early thirties, and included both genders.

Task I procedure

In the first task, each participant was shown both avatars, in a randomized order. At the start of each experiment, we explained that the avatars could be used as a kind of non-verbal communications device. We described to them a particular context in which avatar bodies could replace actual participants' bodies, and in which it would be desirable to communicate non-verbally. The context we chose was a business meeting. Below is an excerpt from the written instructions provided to the participants:

Think about the problem of communication protocols during meetings. Usually during meetings only one speaker has the floor at a time. But it is useful for other participants to be able to communicate to the speaker and to the rest of the participants how they are reacting to what the speaker has to say. For instance, during a meeting you may want to let people know that you agree with the speaker, but you don’t have anything new to say; or during a debate you may feel you have new relevant information that should entitle you to speak next.

For the purposes of a lab experiment, we wanted to choose find context which would restrict the kinds of communications that participants would find appropriate. This restriction, we believed, would limit that kinds of gestures that might be invented. More importantly, we wanted to use a context that would aid during decoding the gestures, part of the second experimental task.

Once the participants had practiced controlling the avatar for a few minutes, we asked them to design several emblematic gestures and to perform these on the avatar. The messages we asked them to encode were the following:

  1. I have a question
  2. I would like to speak next
  3. I need to interrupt now
  4. I agree with the speaker
  5. I disagree with the speaker
  6. I don’t understand what is being said

Though the participants saw all of the messages in this list at once, they were asked to design the gestures one at a time in the order given above.

Task I measurements

During this task we recorded the following measurements:

  1. The length of time it took for participants to invent each gesture,
  2. The participant's subjective level of satisfaction that the gesture could communicate the message, and
  3. The participant's subjective sense of how distinctive the gesture was from the other gestures designed on the particular avatar.

We measured the length of time to invent the gesture by noting the time between when we asked the participant to invent a gesture for a particular message, and the time when the participant informed us that they had completed their design. We used this measurement as a way of gauging the difficulty of inventing the gesture.

We measured the degree of satisfaction of the participant and their sense of gesture distinctiveness by asking the participant to rate on a scale of 1 to 5 their subjective feelings on these matters. We measured degree of satisfaction because we felt this rating would give us an indication of how effective the avatar gestures were in allowing the participant to feel that they could express themselves using this medium. Asking about the level of distinctiveness of the gesture was a way to determine the expressive power of the medium. The idea is that the more distinctive the gestures that a person could invent on an avatar, the greater the expressive power of the avatar.

We also recorded the actual gestures that the users invented. Our recording method was to draw the path taken by the arm. Since, the arm only moves around in a small, well-defined workspace, drawing the path was not difficult. Further, in most cases, the gesture was just a static pose of the arm, so one needed only need to record the pose. We did not feel the need to record the speed of the movement since the moving gestures tended to move at about the same speed, and the design characteristics of the gestures did not seem to depend minutely on the speed of the motion.

Task II procedure

The second task took place over two meetings. Each participant in the second task was shown a video of the avatars performing the gestures that were invented by one of the participants from the first task. The purpose of the first meeting was to measure how well gestures could be decoded at first sight, and the second meeting was used to measure recall of gesture meanings.

The video of the avatars gesturing was created after the Task I experiments were completed by all eleven of its the participants. To make the video, the experimenters referred to the notes on each of the gestures, and then recreated the gestures on the avatars. The graphical output from the avatar system was then recorded onto a video tape. We use video tape because showing video is easier than setting up the avatar software for every experiment. Having a video tape also meant that we could conduct the experiment at any location that was convenient for the participant as long as VCR was available.

At the first meeting, participants were shown video of the two avatars, with each avatar showing its six gestures invented by one of the participants from the first task. The avatars were presented in a randomized order though the gestures themselves were shown in the same order in which they were invented. We explained to them that gestures are meant to communicate messages in a business meeting context. We asked the participants to match the messages to the gestures. The messages were written on six index cards that were handed to the participant at the beginning of the experiment. After the participant had labeled all of the gestures, they were shown all of the gestures one more time and told the meaning of the gestures.

The second meeting with the participants took place from seven to fourteen days after the initial meeting. At the second meeting, the participants were shown videos of the avatars, each performing the same six gestures. This time, however, the order in which the gestures were presented had been randomized. Again, the participant was asked to label each gesture for the two different avatars. In addition, the participant was asked to rate how effective they felt the gestures was communicating its intended message, and the distinctiveness of the gesture relative to the other gestures performed on the same avatar.

Task II measurements

During the initial meeting we recorded the number of correct matches given by the participant.

At the second meeting, we recorded the number of correct matches, and the participant's ratings of the effectiveness and the distinctiveness of the gestures. As with the first task, the participants were asked to give their ratings on a scale of 1 to 5, with 5 being best or most and 1 being worst or least.

Results Summary

Task I summary

For the first task, we found that the time to invent a gesture was significantly longer for the two-jointed avatar (Avatar 2) than it was for the single jointed avatar (Avatar 1). The satisfaction ratings and the distinctiveness ratings did not differ significantly between the two avatars. In both cases, the satisfaction ratings and the effectiveness ratings averaged to a little over 3.

Task II summary

For the results from the initial meeting, that is, the first time that the participants tried to match the messages with the gestures, we found that on average the number of correct responses did not differ significantly from chance. The number of correct matches during the second meeting also did not differ significantly from chance. In both cases, we found that on average the number of correct answers did no differ significantly between the two avatar sets.

The effectiveness ratings were the same for the two avatars. When compared to the effectiveness ratings given by the gesture inventors, we found that the observers gave slightly lower ratings. This same trend was true for the distinctiveness ratings.

Discussion of results

Factors other than avatar jointedness

Initially when we designed our experiments, we assumed that the most significant factor would be the kinematic design of the avatars. Our initial intuitions told us that people would like using the two jointed avatar more because it is more anthropomorphic. The results of our experiment suggest, however, that the jointedness of the avatar does not make a difference to users' or observers' feelings about the effectiveness or distinctiveness of the avatar gestures. Nor does this aspect of the design seem to affect the ability of observers to decode or even retain the meaning of the messages. The only significant difference we saw was in the time it took to invent the gestures which was longer for Avatar 2.

In reviewing our results, we have to admit that design features other than jointedness differed between the two avatars, and that perhaps these features affected the scores. Further, these other design characteristics may be more significant to users in their acceptance of communication via avatars. Though we did not design the experiment to tell us about these design factors, we did make some interesting observations while conducting these experiments that might be of value in future avatar designs.

Control interface

Though the GUI widget used to control the two avatars was the same, the mapping between the widget manipulation and the resulting motion of the avatars was different. In fact, the design of the widget was more suited to the kinematic properties of Avatar 1. As the user moved their cursor inside of the circle, they could see the avatar's arm moving in a circle. The widget was more difficult to use for controlling Avatar 2's motion because the avatar's arm cannot move in a full circle. The area that the avatar's hand can reach is actually disjoint, consisting of two concentric semi-annulus's centered at the shoulder joint. Thus, the interface was confusing on two points: the shape of the widget does not correspond to the shape of the avatar's workspace; and though a cursor can be moved around continuously in the circle drawn on the widget, there was a discontinuity in the space in which the avatar's hand can be controlled. On the whole, the control was far more difficult for the two-jointed avatar than single-jointed avatars.

Another consequence of the two-jointed avatar control scheme was that it was difficult to execute gestures at all. As the experimenters produced examples of the gestures to record onto video, we found that it was difficult to control the two-jointed avatar well enough accurately re-create the desired gesture. (During task I, we asked the participants to perform the gesture on the avatars, but we did not require that they do so perfectly.) This made showing gestures to the Task II participants more difficult. Though the video was edited to show each gesture in it's own labeled clip, we found that many of the recorded gestures had extraneous movements that were not part of the desired gesture. Certainly, this made the Task II participants' task more difficult than it would have been if the gestures could be executed cleanly.

Finally, the two avatars differed in the ways that the arms could be moved with respect to the rest of the body. We found that a popular gesture was to point the avatar hand at the avatar's head. Some participants described this as pointing to the head, scratching the head or hitting the head with the hand. Unfortunately, for the two-jointed avatar, its hand could not actually touch its head. So, at times the gesture inventor would intend for the avatar to look as if it were touching it's head, but the avatar would only be waving its hand near to its head. This observation also suggests that the design of the whole body is important to gesturing even if only parts of the body can move. Many gestures were designed with reference points on the body in mind. Thought the body of the avatar consisted of only a stick, participants often referred to the movement in relation to the avatar's stomach or its shoulder as well as the aforementioned head.

Other observations and user comments

Though the experimenters did not ask, the participants often offered their opinions about which avatar is better. The comments did tend to reinforce the data. As many participants commented that they found the single jointed avatar to be more expressive as commented that the other avatar was more expressive. Among the reasons cited for preferring the single-jointed avatar are that it is easier to control, and that the simple abstract movement of the arm, reminiscent of hands of a clock [participant's description], make the gestures more distinctive. Participants who preferred the two-jointed arm thought that the two-jointed arm was more expressive especially because it looked more human. On the other hand, there were participants who thought that the motion of the arm was definitely unlike a human's movements.

There also seemed to be a difference in the way that users conceptualized the use of gestures. Some participants seemed to think it was possible to invent avatar gestures that would be easily understood by others because of some sort of universal appeal. For instance, the gesture of raising a hand to ask a question seems well understood in our culture. These users were more likely to say that a gesture is very expressive if it seemed to "look" like what it was supposed to mean. Other users did not seem to be able to conceptualize gestures as "looking" like anything in particular. These participants would argue that all gestures are arbitrary and are equally expressive assuming that some group has agreed ahead of time to use the gestures to mean particular things. The fact that some participants designed gestures that were not meant to look like anything may have made the decoding job more difficult for some participants of Task II.

Experimental design flaws

The first major flaw is that each inventor's gestures were evaluated for by one observer. This, it is unclear what it means to compare the Task II participants' responses to each other since each participant was rating gestures designed by a different person. Perhaps the results would be more meaningful if several observers were to rate the gestures from a single inventor.

Another major design flaw was that the inventors designed all of the gestures in the same order. The particular order the gestures were invented in may have made a difference in how difficult it was to invent them. The order of the gestures should have been randomized in order to remove this factor from the analysis.

Another problem we found as we ran the experiment is that participants found the three messages "I have a question", "I want to speak next", and "I need to interrupt now" very similar. In some cases, inventors would use the same gesture to mean two different messages because they felt that the end result, pragmatically, should be the same. In looking back on the experiment, we feel that we should have given more thought to choosing messages that themselves are more distinctive.

Finally, the biggest problem may have been with our hypothesis. Because we are interested in how human behaviors might be applied to avatar bodies, perhaps we should have asked the question how well can people express themselves using avatar bodies as compared to using their own bodies. Had we asked this question, we could have designed an experiment that also involved inventing and decoding gestures on humans.

Conclusion

We found that evaluating gestures is a difficult task, perhaps whether or not it involves avatars. Though we tried to pin our results on the kinematic design of avatars, we found as we used the avatars that other design features are as importand if not more important than jointedness. More thought should gone into actually being able to accurately control the avatar movements, and into understanding what makes movement meaningful at all.

Bibliography

  1. Harper, R. G., A. N. Wiens, and J. D. Matarazzo, Nonverbal Communication: The State of the Art. 1978, New York: John Wiley & Sons.
  2. Mirtich, B. and J. Canny, Impulse-based Dynamic Simulation. 1994, UC Berkeley Computer Science Division (ECCS) : Berkeley, California.