Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
Niklas Lauffer, Beyazit Yalcinkaya, Marcell Vazquez-Chanlatte, and Sanjit A. Seshia. Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning. In Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Download
Abstract
Goal-conditioned reinforcement learning is a powerful way to control an AI agent's behavior at runtime. That said, popular goal representations, e.g., target states or natural language, are either limited to Markovian tasks or rely on ambiguous task semantics.We propose representing temporal goals using compositions of deterministic finite automata (cDFA). cDFAs balance the need for formal temporal semantics with ease of interpretation--if one can understand a flow chart, one can understand a cDFA. On the other hand, cDFA form a countably infinite concept class with Boolean semantics, and subtle changes to the automata can result in very different agent behavior.To address this, we observe that all paths through a DFA correspond to a series of reach-avoid tasks. Based on this, we propose pre-training graph neural network embedding on “reach-avoid derived” DFAs.Empirically, we demonstrate that the proposed pre-training method enables zero-shot generalization to various cDFA task classes and accelerated policy specialization.
BibTeX
@inproceedings{gcrl-neurips24, author = {Niklas Lauffer and Beyazit Yalcinkaya and Marcell Vazquez-Chanlatte and Sanjit A. Seshia}, title = {Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning}, booktitle = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, OPTpages = {331--351}, year = {2024}, abstract = {Goal-conditioned reinforcement learning is a powerful way to control an AI agent's behavior at runtime. That said, popular goal representations, e.g., target states or natural language, are either limited to Markovian tasks or rely on ambiguous task semantics. We propose representing temporal goals using compositions of deterministic finite automata (cDFA). cDFAs balance the need for formal temporal semantics with ease of interpretation-- if one can understand a flow chart, one can understand a cDFA. On the other hand, cDFA form a countably infinite concept class with Boolean semantics, and subtle changes to the automata can result in very different agent behavior. To address this, we observe that all paths through a DFA correspond to a series of reach-avoid tasks. Based on this, we propose pre-training graph neural network embedding on ``reach-avoid derived'' DFAs. Empirically, we demonstrate that the proposed pre-training method enables zero-shot generalization to various cDFA task classes and accelerated policy specialization.}, }