Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, Dave Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin.

April 2022

Workshop Preprint

Abstract

Randomly masking sub-portions of sentences has been a very successful approach in training natural language processing models for a variety of tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many traditional tasks like behavior cloning, offline RL, inverse dynamics, or planning correspond to different sequence maskings. We introduce the FlexiBiT framework, which enables to flexibly specify models which can be trained on many different sequential decision making tasks. Experimentally, we show that we can train a single FlexiBiT model to perform all tasks with performance similar to or better than specialized models, and that such performance can be further improved by fine-tuning this general model on the task of interest.

Type

Workshop paper

Publication

The first workshop on Generalizable Policy Learning in the Physical World at The Tenth International Conference on Learning Representations