CS 160: Lecture 23
Review
- Adaptive help - user modeling - what are some ways of representing knowledge about the user?
- What is a mixed initiative interface?
Multimodal Interfaces
- Multi-modal refers to interfaces that support non-GUI interaction.
- Speech and pen input are two common examples - and are complementary.
Speech+pen Interfaces
- Speech is the preferred medium for subject, verb, object expression.
- Writing or gesture provide locative information (pointing etc).
Speech+pen Interfaces
- Speech+pen for visual-spatial tasks (compared to speech only)
- 10% faster.
- 36% fewer task-critical errors.
- Shorter and simpler linguistic constructions.
- 90-100% user preference to interact this way.
Multimodal advantages
- Advantages for error recovery:
- Users intuitively pick the mode that is less error-prone.
- Language is often simplified.
- Users intuitively switch modes after an error, so the same problem is not repeated.
Multimodal advantages
- Other situations where mode choice helps:
- Users with disability.
- People with a strong accent or a cold.
- People with RSI.
- Young children or non-literate users.
Multimodal advantages
- For collaborative work, multimodal interfaces can communicate a lot more than text:
- Speech contains prosodic information.
- Gesture communicates emotion.
- Writing has several expressive dimensions.
Multimodal challenges
- Using multimodal input generally requires advanced recognition methods:
- For each mode.
- For combining redundant information.
- For combining non-redundant information: “open this file (pointing)”
-
- Information is combined at two levels:
- Feature level (early fusion).
- Semantic level (late fusion).
Early fusion
- Early fusion applies to combinations like speech+lip movement. It is difficult because:
- Of the need for MM training data.
- Because data need to be closely synchronized.
- Computational and training costs.
Late fusion
- Late fusion is appropriate for combinations of complementary information, like pen+speech.
- Recognizers are trained and used separately.
- Unimodal recognizers are available off-the-shelf.
- Its still important to accurately time-stamp all inputs: typical delays are known between e.g. gesture and speech.
Contrast between MM and GUIs
- GUI interfaces often restrict input to single non-overlapping events, while MM interfaces handle all inputs at once.
- GUI events are unambiguous, MM inputs are based on recognition and require a probabilistic approach
- MM interfaces are often distributed on a network.
Agent architectures
- Allow parts of an MM system to be written separately, in the most appropriate language, and integrated easily.
- OAA: Open-Agent Architecture (Cohen et al) supports MM interfaces.
- Blackboards and message queues are often used to simplify inter-agent communication.
- Jini, Javaspaces, Tspaces, JXTA, JMS, MSMQ...
Adminstrative
- Final project presentations are next week (Dec 4 and 6).
- Presentations go by group number. Groups 1-5 on Tuesday, groups 6-10 on Thursday.
- Final reports are due on Friday the 7th.
Symbolic/statistical approaches
- Allow symbolic operations like unification (binding of terms like “this”) + probabilistic reasoning (possible interpretations of “this”).
- The MTC system is an example
- Members are recognizers.
- Teams cluster data from recognizers.
- The committee weights results from various teams.
MTC architecture
MM systems
- Designers Outpost (Berkeley)
MM systems: Quickset (OGI)
Crossweaver (Berkeley)