We present a new multi-objective optimization approach for synthesizing interpretations that ''explain'' the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.
Autonomous systems are increasingly deployed in safety-critical applications and rely more on high-performance AI/ML-based components. Runtime monitors play an important role in raising the level of assurance in AI/ML-based autonomous systems by ensuring that the autonomous system stays safe within its operating environment. In this tutorial, we present VerifAI, an open-source toolkit for the formal design and analysis of systems that include AI/ML components. VerifAI provides features supporting a variety of use cases including formal modeling of the autonomous system and its environment, automatic falsification of system-level specifications as well as other simulation-based verification and testing methods, automated diagnosis of errors, and automatic specification-driven parameter and component synthesis. In particular, we describe the use of VerifAI for generating runtime monitors that capture the safe operational environment of systems with AI/ML components. We illustrate the advantages and applicability of VerifAI in real-life applications using a case study from the domain of autonomous aviation.
We investigate the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics. In such systems, every state may be associated with a risk, e.g., the probability of an imminent crash. During runtime, we obtain partial information about the system state in form of observations. The monitor uses this information to estimate the risk of the (unobservable) current system state. Our results are threefold.First, we show that extensions of state estimation approaches do not scale due the combination of nondeterminism and probabilities. While exploiting a geometric interpretation of the state estimates improves the practical runtime, this cannot prevent an exponential memory blowup. Second, we present a tractable algorithm based on model checking conditional reachability probabilities. Third, we provide prototypical implementations and manifest the applicability of our algorithms to a range of benchmarks. The results highlight the possibilities and boundaries of our novel algorithms.
We present an implementation of SOTER, a run-time assurance framework for building safe distributed mobile robotic (DMR) systems, on top of the Robot Operating System (ROS). The safety of DMR systems cannot always be guaranteed at design time, especially when complex, off-the-shelf components are used that cannot be verified easily. SOTER addresses this by providing a language-based approach for run-time assurance for DMR systems. SOTER implements the reactive robotic software using the language P, a domain-specific language designed for implementing asynchronous event-driven systems, along with an integrated run-time assurance system that allows programers to use uncertified components but still provide safety guarantees. We describe an implementation of SOTER for ROS and demonstrated its efficacy using a multi-robot surveillance case study, with multiple run-time assurance modules and show, through rigorous simulation, that SOTER enabled systems ensure safety, even when using unknown and untrusted components.
Hyperproperties are properties that describe the correctness of a system as a relation between multiple executions. Hyperproperties generalize trace properties and include information-flow security requirements, like noninterference, as well as requirements like symmetry, partial observation, robustness, and fault tolerance. We initiate the study of the specification and verification of hyperproperties of Markov decision processes (MDPs). We introduce the temporal logic PHL (Probabilistic Hyper Logic), which extends classic probabilistic logics with quantification over schedulers and traces. PHL can express a wide range of hyperproperties for probabilistic systems, including both classical applications, such as probabilistic noninterference, and novel applications in areas such as robotics and planning. While the model checking problem for PHL is in general undecidable, we provide methods both for proving and for refuting formulas from a fragment of the logic. The fragment includes many probabilistic hyperproperties of interest
Reactive synthesis transforms a specification of a reactive system, given in a temporal logic, into an implementation. The main advantage of synthesis is that it is automatic. The main disadvantage is that the implementation is usually very difficult to understand. In this paper, we present a new synthesis process that explains the synthesized implementation to the user. The process starts with a simple version of the specification and a corresponding simple implementation. Then, desired properties are added one by one, and the corresponding transformations, repairing the implementation, are explained in terms of counterexample traces. We present SAT-based algorithms for the synthesis of repairs and explanations. The algorithms are evaluated on a range of examples including benchmarks taken from the SYNTCOMP competition.
Automata over infinite words, also known as omega-automata, play a key role in the verification and synthesis of reactive systems. The spectrum of omega-automata is defined by two characteristics: the acceptance condition (e.g. Büchi or parity) and the determinism (e.g., deterministic or nondeterministic) of an automaton. These characteristics play a crucial role in applications of automata theory. For example, certain acceptance conditions can be handled more efficiently than others by dedicated tools and algorithms. Furthermore, some applications, such as synthesis and probabilistic model checking, require that properties are represented as some type of deterministic omega-automata. However, properties cannot always be represented by automata with the desired acceptance condition and determinism. In this paper we study the problem of approximating linear-time properties by automata in a given class. Our approximation is based on preserving the language up to a user-defined precision given in terms of the size of the finite lasso representation of infinite executions that are preserved. We study the state complexity of different types of approximating automata, and provide constructions for the approximation within different automata classes, for example, for approximating a given automaton by one with a simpler acceptance condition.
In stream-based runtime monitoring, streams of data, called input streams, which involve data collected from the system at runtime, are translated into new streams of data, called output streams, which define statistical measures and verdicts on the system based on the input data. The advantage of this setup is an easy-to-use and modular way for specifying monitors with rich verdicts, provided with formal guarantees on the complexity of the monitor.
In this tutorial, we give an overview of the different classes of stream specification languages, in particular those with real-time features. With the help of the real-time stream specification language RTLola, we illustrate which features are necessary for the definition of the various types of real-time properties and we discuss how these features need to be implemented in order to guarantee memory efficient and reliable monitors. To demonstrate the expressive power of the different classes of stream specification languages and the complexity of the different features, we use a series of examples based on our experience with monitoring problems from the areas of unmanned aerial systems and telecommunication networks.
An essential part of cyber-physical systems is the online evaluation of real-time data streams. Especially in systems that are intrinsically safety-critical, a dedicated monitoring component inspecting data streams to detect problems at runtime greatly increases the confidence in a safe execution. Such a monitor needs to be based on a specification language capable of expressing complex, high-level properties using only the accessible low-level signals. Moreover, tight constraints on computational resources exacerbate the requirements on the monitor. Thus, several existing approaches to monitoring are not applicable due to their dependence on an operating system. We present an FPGA-based monitoring approach by compiling an RTLola specification into synthesizable VHDL code. RTLola is a stream-based specification language capable of expressing complex real-time properties while providing an upper bound on the execution time and memory requirements. The statically determined memory bound allows for a compilation to an FPGA with a fixed size. An advantage of FPGAs is a simple integration process in existing systems and superb executing time. The compilation results in a highly parallel implementation thanks to the modular nature of RTLola specifications. This further increases the maximal event rate the monitor can handle.
The unrealizability of a specification is often due to the assumption that the behavior of the environment is unrestricted. In this paper, we present algorithms for synthesis in bounded environments, where the environment can only generate input sequences that are ultimately periodic words (lassos) with finite representations of bounded size. We provide automata-theoretic and symbolic approaches for solving this synthesis problem, and also study the synthesis of approximative implementations from unrealizable specifications. Such implementations may violate the specification in general, but are guaranteed to satisfy the specification on at least a specified portion of the bounded-size lassos. We evaluate the algorithms on different arbiter specifications.
With ever increasing autonomy of cyber-physical systems, monitoring becomes an integral part for ensuring the safety of the system at runtime. StreamLAB is a monitoring framework with high degree of expressibility and strong correctness guarantees. Specifications are written in RTLola, a stream-based specification language with formal semantics. StreamLAB provides an extensive analysis of the specification, including the computation of memory consumption and run-time guarantees. We demonstrate the applicability of StreamLAB on typical monitoring tasks for cyber-physical systems, such as sensor validation and system health checks.
Hyperproperties elevate the traditional view of trace properties form sets of traces to sets of sets of traces and provide a formalism for expressing information-flow policies. For trace properties, algorithms for verification, monitoring, and synthesis are typically based on a representation of the properties as omega-automata. For hyperproperties, a similar, canonical automata-theoretic representation is, so far, missing. This is a serious obstacle for the development of algorithms, because basic constructions, such as learning algorithms, cannot be applied. In this paper, we present a canonical representation for the widely used class of regular k-safety hyperproperties, which includes important polices such as noninterference. We show that a regular k-safety hyperproperty S can be represented by a finite automaton, where each word accepted by the automaton represents a violation of S. The representation provides an automata-theoretic approach to regular k-safety hyperproperties and allows us to compare regular k-safety hyperproperties, simplify them, and learn such hyperproperties. We investigate the problem of constructing automata for regular k-safety hyperproperties in general and from formulas in HyperLTL, and provide complexity bounds for the different translations. We also present a learning algorithm for regular k-safety hyperproperties based on the L* learning algorithm for deterministic finite automata.
Hyperproperties are properties of sets of computation traces. In this paper, we study quantitative hyperproperties, which we define as hyperproperties that express a bound on the number of traces that may appear in a certain relation. For example, quantitative non-interference limits the amount of information about certain secret inputs that is leaked through the observable outputs of a system. Quantitative non-interference thus bounds the number of traces that have the same observable input but different observable output. We study quantitative hyperproperties in the setting of HyperLTL, a temporal logic for hyperproperties. We show that, while quantitative hyperproperties can be expressed in HyperLTL, the running time of the HyperLTL model checking algorithm is, depending on the type of property, exponential or even doubly exponential in the quantitative bound. We improve this complexity with a new model checking algorithm based on model-counting. The new algorithm needs only logarithmic space in the bound and therefore improves, depending on the property, exponentially or even doubly exponentially over the model checking algorithm of HyperLTL. In the worst case, the new algorithm needs polynomial space in the size of the system. Our Max#Sat-based prototype implementation demonstrates, however, that the counting approach is viable on systems with nontrivial quantitative information flow requirements such as a passcode checker.
In formal synthesis of reactive systems an implementation of a system is automatically constructed from its formal specification. The great advantage of synthesis is that the resulting implementation is correct by construction; therefore there is no need for manual programming and tedious debugging tasks. Developers remain, nevertheless, hesitant to using automatic synthesis tools and still favor manually writing code. A common argument against synthesis is that the resulting implementation does not always give a clear picture on what decisions were made during the synthesis process. The outcome of synthesis tools is mostly unreadable and hinders the developer from understanding the functionality of the resulting implementation. Many attempts have been made in the last years to make the synthesis process more transparent to users. Either by structuring the outcome of synthesis tools or by providing additional automated support to help users with the specification process. In this paper we discuss the challenges in writing specifications for reactive systems and give a survey on what tools have been developed to guide users in specifying reactive systems and understanding the outcome of synthesis tools.
We determine the complexity of counting models of bounded size of specifications expressed in linear-time temporal logic. Counting word-models is #P-complete, if the bound is given in unary, and as hard as counting accepting runs of nondeterministic polynomial space Turing machines, if the bound is given in binary. Counting tree-models is as hard as counting accepting runs of nondeterministic exponential time Turing machines, if the bound is given in unary. For a binary encoding of the bound, the problem is at least as hard as counting accepting runs of nondeterministic exponential space Turing machines, and not harder than counting accepting runs of nondeterministic doubly-exponential time Turing machines. Finally, counting arbitrary transition systems satisfying a formula is #P-hard and not harder than counting accepting runs of nondeterministic polynomial time Turing machines with a PSPACE oracle, if the bound is given in unary. If the bound is given in binary, then counting arbitrary models is as hard as counting accepting runs of nondeterministic exponential time Turing machines.
Finding models for linear-time properties is a central problem in verification and planning. We study the distribution of linear-time models by investigating the density of linear-time properties over the space of ultimately periodic words. The density of a property over a bound n is the ratio of the number of lasso-shaped words of length n, that satisfy the property, to the total number of lasso-shaped words of length n. We investigate the problem of computing the density for both linear-time properties in general and for the important special case of omega-regular properties. For general linear-time properties, the density is not necessarily convergent and can oscillates indefinitely for certain properties. However, we show that the oscillation is bounded by the growth of the sets of bad- and good-prefix of the property. For omega-regular properties, we show that the density is always convergent and provide a general algorithm for computing the density of omega-regular properties as well as more specialized algorithms for certain sub-classes and their combinations.
We introduce Lola2.0, a stream-based specification language for the precise description of complex security properties in network traffic. The language extends the specification language Lola with two new features: template stream expressions, which allow input data to be carried along the stream, and dynamic stream generation, where new monitors can be invoked during the monitoring process for the monitoring of new subtasks on their own time scale. Lola 2.0 is simple and expressive: it combines the ease-of-use of rule-based specification languages like Snort with the expressiveness of heavy-weight scripting languages or temporal logics previously needed for the description of complex stateful dependencies and statistical measures. Lola 2.0 specifications are monitored by incrementally constructing output streams from input streams, while maintaining a store of partially evaluated expressions. We demonstrate the flexibility and expressivity of Lola 2.0 using a prototype implementation on several practical examples.
We present an analysis technique for temporal specifications of reactive systems that identifies, on the level of individual system outputs over time, which parts of the implementation are determined by the specification, and which parts are still open. This information is represented in the form of a labeled transition system, which we call skeleton. Each state of the skeleton is labeled with a three-valued assignment to the output variables: each output can be true, false, or open, where true or false means that the value must be true or false, respectively, and open means that either value is still possible. We present algorithms for the verification of skeletons and for the learning-based synthesis of skeletons from specifications in linear-time temporal logic (LTL). The algorithm returns a skeleton that satisfies the given LTL specification in time polynomial in the size of the minimal skeleton. Our new analysis technique can be used to recognize and repair specifications that underspecify critical situations. The technique thus complements existing methods for the recognition and repair of overspecifications via the identification of unrealizable cores.
We determine the complexity of counting models of bounded size of specifications expressed in Linear-time Temporal Logic. Counting word-models is #P-complete, if the bound is given in unary, and as hard as counting accepting runs of nondeterministic polynomial space Turing machines, if the bound is given in binary. Counting tree-models is as hard as counting accepting runs of nondeterministic exponential time Turing machines, if the bound is given in unary. For a binary encoding of the bound, the problem is at least as hard as counting accepting runs of nondeterministic exponential space Turing machines. On the other hand, it is not harder than counting accepting runs of nondeterministic doubly-exponential time Turing machines.
We investigate the model counting problem for safety specifications expressed in linear-time temporal logic (LTL). Model counting has previously been studied for propositional logic; in planning, for example, propositional model counting is used to compute the plan’s robustness in an incomplete domain. Counting the models of an LTL formula opens up new applications in verification and synthesis. We distinguish word and tree models of an LTL formula. Word models are labeled sequences that satisfy the formula. Counting the number of word models can be used in model checking to determine the number of errors in a system. Tree models are labeled trees where every branch satisfies the formula. Counting the number of tree models can be used in synthesis to determine the number of implementations that satisfy a given formula. We present algorithms for the word and tree model counting problems, and compare these direct constructions to an indirect approach based on encodings into propositional logic.