Sanjit Seshia's Publications formal policy learning from demonstrations for reachability

Formal Policy Learning from Demonstrations for Reachability

Hadi Ravanbakhsh, Sriram Sankaranarayanan, and Sanjit A. Seshia. Formal Policy Learning from Demonstrations for Reachability. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), May 2019.

Download

[pdf]

Abstract

We consider the problem of learning structured, closed-loop policies (feedback laws) from demonstrations in order to control underactuated robotic systems, so that formal behavioral specifications such as reaching a target set of states are satisfied. Our approach uses a “counterexample-guided” iterative loop that involves the interaction between a policy learner, a demonstrator and a verifier. The learner is responsible for querying the demonstrator in order to obtain the training data to guide the construction of a policy candidate. This candidate is analyzed by the verifier and either accepted as correct, or rejected with a counterexample. In the latter case, the counterexample is used to update the training data and further refine the policy. The approach is instantiated using receding horizon model-predictive controllers (MPCs) as demonstrators. Rather than using regression to fit a policy to the demonstrator actions, we extend the MPC formulation with the gradient of the cost-to-go function evaluated at sample states in order to constrain the set of policies compatible with the behavior of the demonstrator. We demonstrate the successful application of the resulting policy learning schemes on two case studies and we show how simple, formally-verified policies can be inferred starting from a complex and unverified nonlinear MPC implementations. As a further benefit, the policies are many orders of magnitude faster to implement when compared to the original MPCs.

BibTeX

@inproceedings{ravanbakhsh-icra19,
  author    = {Hadi Ravanbakhsh and
               Sriram Sankaranarayanan and
               Sanjit A. Seshia},
  title     = {Formal Policy Learning from Demonstrations for Reachability},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  month = "May",
  year = {2019},
  abstract = {We consider the problem of learning structured, closed-loop policies (feedback laws) from demonstrations in order to control underactuated robotic systems, so that formal behavioral specifications such as reaching a target set of states are satisfied. Our approach uses a “counterexample-guided” iterative loop that involves the interaction between a policy learner, a demonstrator and a verifier. The learner is responsible for querying the demonstrator in order to obtain the training data to guide the construction of a policy candidate. This candidate is analyzed by the verifier and either accepted as correct, or rejected with a counterexample. In the latter case, the counterexample is used to update the training data and further refine the policy.  
	  The approach is instantiated using receding horizon model-predictive controllers (MPCs) as demonstrators. Rather than using regression to fit a policy to the demonstrator actions, we extend the MPC formulation with the gradient of the cost-to-go function evaluated at sample states in order to constrain the set of policies compatible with the behavior of the demonstrator.  We demonstrate the successful application of the resulting policy learning schemes on two case studies and we show how simple, formally-verified policies can be inferred starting from a complex and unverified nonlinear MPC implementations. As a further benefit, the policies are many orders of magnitude faster to implement when compared to the original MPCs.},
}