Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Abstract

Randomly masking sub-portions of sentences has been a very successful approach in training natural language processing models for a variety of tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many traditional tasks like behavior cloning, offline RL, inverse dynamics, or planning correspond to different sequence maskings. We introduce the FlexiBiT framework, which enables to flexibly specify models which can be trained on many different sequential decision making tasks. Experimentally, we show that we can train a single FlexiBiT model to perform all tasks with performance similar to or better than specialized models, and that such performance can be further improved by fine-tuning this general model on the task of interest.

Publication
The first workshop on Generalizable Policy Learning in the Physical World at The Tenth International Conference on Learning Representations