Sanjit Seshia's Publications querying labelled data with scenario programs for sim-to-real validation

Querying Labelled Data with Scenario Programs for Sim-to-Real Validation

Edward Kim, Jay Shenoy, Sebastian Junges, Daniel J. Fremont, Alberto L. Sangiovanni-Vincentelli, and Sanjit A. Seshia. Querying Labelled Data with Scenario Programs for Sim-to-Real Validation. In Proceedings of the International Conference on Cyber-Physical Systems (ICCPS), pp. 34–45, April 2022.

Download

[pdf]

Abstract

Simulation-based testing of autonomous vehicles (AVs) has become an essential complement to road testing to ensure safety. Conse-quently, substantial research has focused on searching for failure scenarios in simulation. However, a fundamental question remains: are AV failure scenarios identified in simulation meaningful in reality - i.e., are they reproducible on the real system? Due to the sim-to-real gap arising from discrepancies between simulated and real sensor data, a failure scenario identified in simulation can be either a spurious artifact of the synthetic sensor data or an actual failure that persists with real sensor data. An approach to validate simulated failure scenarios is to identify instances of the scenario in a corpus of real data, and check if the failure persists on the real data. To this end, we propose a formal definition of what it means for a labelled data item to match an abstract scenario, encoded as a scenario program using the Scenic probabilistic programming language. Using this definition, we develop a querying algorithm which, given a scenario program and a labelled dataset, finds the subset of data matching the scenario. Experiments demonstrate that our algorithm is accurate and efficient on a variety of realistic traffic scenarios, and scales to a reasonable number of agents.

BibTeX

@inproceedings{kim-iccps22,
author = {Edward Kim and
Jay Shenoy and
Sebastian Junges and
Daniel J. Fremont and
Alberto L. Sangiovanni{-}Vincentelli and
Sanjit A. Seshia},
title = {Querying Labelled Data with Scenario Programs for Sim-to-Real Validation},
booktitle = {Proceedings of the International Conference on Cyber-Physical Systems (ICCPS)},
pages = "34--45",
month = "April",
year = {2022},
abstract = {Simulation-based testing of autonomous vehicles (AVs) has become an essential complement to road testing to ensure safety. Conse-quently, substantial research has focused on searching for failure scenarios in simulation. However, a fundamental question remains: are AV failure scenarios identified in simulation meaningful in re-ality - i.e., are they reproducible on the real system? Due to the sim-to-real gap arising from discrepancies between simulated and real sensor data, a failure scenario identified in simulation can be either a spurious artifact of the synthetic sensor data or an actual failure that persists with real sensor data. An approach to validate simulated failure scenarios is to identify instances of the scenario in a corpus of real data, and check if the failure persists on the real data. To this end, we propose a formal definition of what it means for a labelled data item to match an abstract scenario, encoded as a scenario program using the Scenic probabilistic programming language. Using this definition, we develop a querying algorithm which, given a scenario program and a labelled dataset, finds the subset of data matching the scenario. Experiments demonstrate that our algorithm is accurate and efficient on a variety of realistic traffic scenarios, and scales to a reasonable number of agents.},
}