CS 160: Lecture 23

Fall 2001

Nov 27, 2001

Review

Types of help?

Adaptive help - user modeling - what are some ways of representing knowledge about the user?

What is a mixed initiative interface?

Multimodal Interfaces

Multi-modal refers to interfaces that support non-GUI interaction.

Speech and pen input are two common examples - and are complementary.

Speech+pen Interfaces

Speech is the preferred medium for subject, verb, object expression.

Writing or gesture provide locative information (pointing etc).

Speech+pen Interfaces

Speech+pen for visual-spatial tasks (compared to speech only)
- 10% faster.
- 36% fewer task-critical errors.
- Shorter and simpler linguistic constructions.
- 90-100% user preference to interact this way.

Multimodal advantages

Advantages for error recovery:
- Users intuitively pick the mode that is less error-prone.
- Language is often simplified.
- Users intuitively switch modes after an error, so the same problem is not repeated.

Multimodal advantages

Other situations where mode choice helps:
- Users with disability.
- People with a strong accent or a cold.
- People with RSI.
- Young children or non-literate users.

Multimodal advantages

For collaborative work, multimodal interfaces can communicate a lot more than text:
- Speech contains prosodic information.
- Gesture communicates emotion.
- Writing has several expressive dimensions.

Multimodal challenges

Using multimodal input generally requires advanced recognition methods:
- For each mode.
- For combining redundant information.
- For combining non-redundant information: “open this file (pointing)”

Information is combined at two levels:
- Feature level (early fusion).
- Semantic level (late fusion).

Early fusion

Early fusion applies to combinations like speech+lip movement. It is difficult because:
- Of the need for MM training data.
- Because data need to be closely synchronized.
- Computational and training costs.

Late fusion

Late fusion is appropriate for combinations of complementary information, like pen+speech.
- Recognizers are trained and used separately.
- Unimodal recognizers are available off-the-shelf.
- Its still important to accurately time-stamp all inputs: typical delays are known between e.g. gesture and speech.

Contrast between MM and GUIs

GUI interfaces often restrict input to single non-overlapping events, while MM interfaces handle all inputs at once.

GUI events are unambiguous, MM inputs are based on recognition and require a probabilistic approach

MM interfaces are often distributed on a network.

Agent architectures

Allow parts of an MM system to be written separately, in the most appropriate language, and integrated easily.

OAA: Open-Agent Architecture (Cohen et al) supports MM interfaces.

Blackboards and message queues are often used to simplify inter-agent communication.
- Jini, Javaspaces, Tspaces, JXTA, JMS, MSMQ...

Adminstrative

Final project presentations are next week (Dec 4 and 6).

Presentations go by group number. Groups 1-5 on Tuesday, groups 6-10 on Thursday.

Final reports are due on Friday the 7th.

Symbolic/statistical approaches

Allow symbolic operations like unification (binding of terms like “this”) + probabilistic reasoning (possible interpretations of “this”).

The MTC system is an example
- Members are recognizers.
- Teams cluster data from recognizers.
- The committee weights results from various teams.

MTC architecture

MM systems

Designers Outpost (Berkeley)

MM systems: Quickset (OGI)

Crossweaver (Berkeley)