Ben's Blog

December 21, 2018

reple: "Replay-Based" REPLs for Compiled Languages

This blog post is about the details of reple, a tool for creating "replay-based" REPLs ("interpreters") for compiled languages. With reple, each time the user enters a line of code, the whole program is compiled and executed again ("replayed"), and any new output is printed. This is a remarkably simple technique, but it works surprisingly well. I'll first cover 1) why this technique is appealing, even when we have true interpreters for some compiled languages, 2) how it works and why it generalizes to multiple languages, and then 3) do some performance analysis and discuss limitations.

Motivation

I work on high-level programming models, and one of the most important things to do when evaluating a programming model is to look at the code it allows the user to write. Nothing does this better, either for a live presentation or particularly for someone who wants to explore the programming model firsthand, than a REPL, or "read-eval-print-loop." One way to get a REPL is by writing an interpreter that maintains an internal state and actually evaluates each line of input incrementally as the user iterates through the read-eval-print-loop. This is what interpreted languages, like Python or JavaScript, do by default, but you can also go and write an interpreter for compiled languages, like the ROOT Project at CERN has done for C++ with Cling [1].

This blog post is about a different technique for building REPLs that is based on "replay." The idea is that you simply re-run the program with each new REPL input and print any new lines of output. This has obvious performance disadvantages if you're actually doing computation inside your REPL instance—for each new line, you'll have to redo all old computation. However, this technique has other distinct advantages that make it very desirable for certain use cases. One is development time: with reple, you can easily create a REPL for a new compiled language by writing a ~20-line JSON configuration file. Beyond development, it might actually be preferrable to for your REPL to be backed by a compiler: if you're interested in checking the specific behavior of a language standard or compiler (e.g. can GCC 8 handle a specific use of Concepts?), you really want to use the compiler, not an interpreter that's emulating it. In addition, some parallel programming languages and runtimes would require significant work to integrate into an interpreter, and we can easily add a few extensions to our replay technique that make it easy to make REPLs for these special languages and runtimes.

Building Programs

To create valid, fully-formed programs out of REPL lines entered by the user, reple uses a template. The template provides language-specific boilerplate, such as basic includes and a main function in C++, and tells reple where to insert different kinds of REPL lines to build a program. Here's what the template looks like for C++.

  #include <iostream>
  #include <cstdio>
  #include <cstdlib>

  {prolog_lines}

  int main(int argc, char** argv) {
    {repl_lines}
    return 0;
  }

reple splits up REPL lines into two categories: repl_lines, which are normal expressions to be executed, and prolog_lines, which are special expression to be put in the global scope, like function or class definitions. Inputted lines are repl lines by default, and a special delimiter tells reple when input should be interpreted as a prolog line. We usually choose '$' as the prolog delimiter, although this is configurable.

Compiling and Executing Programs

A reple config also provides a template for how to compile and execute a program. These are pretty simple, so I'll just directly show you the JSON from the C++ config file.

  "run": "{bin_fname}",
  "compile" : "{compiler} {cflags} -o {bin_fname} {code_fname} {user_cargs}",
  "compile_args": {
    "compiler": "g++",
    "code_suffix": ".cpp",
    "cflags": "-w -O3"
  },

bin_fname and code_fname are filled in by reple itself with the names for the binary and the code file in the current iteration of the REPL. user_cargs comes from a flag the user gives reple at runtime. This lets the user pick custom compilation options for each REPL session.

The REPL Loop

Once reple gets a new line of input from the user, it inserts all of the current REPL lines into a new source file using the code template from the config file. It then compiles the new program and runs it. We use the /tmp directory to store all these files, which is convenient because it's temporary and also because it's usually mounted as a tmpfs, meaning the files will be resident in RAM and never actually have to be written to disk, significantly speeding things up. Once reple gets the new program output, it compares it to the previous output and prints out any new lines.

Handling Compiler Errors

All users will make errors, and it's important for a replay-based REPL to recognize when a new line introduces a fatal program error. Hopefully the REPL can print out the error so the user knows what to fix, but more importantly than that, we need to avoid polluting the REPL state by permanently inserting an error-prone line in the REPL. This would be disastrous, since the user would be stuck forever with a broken program. Luckily, all compilers have a well-defined protocol for fatal errors: 1) they don't produce a binary when there's a fatal compilation error, and 2) they generally print out an error message on standard error.

reple handles errors by detecting when a compiler fails to build a binary, recognizing this as a fatal compilation error. When this occurs, reple will refrain from inserting the new line into the REPL state, thus avoiding polluting the REPL state with an invalid program line. It then prints out any output from the compiler straight to the user, so that he can see the native compiler error. This is simple and general, but also works quite well in practice.

  > int x = ;
  /tmp/repl/repl0.cpp:8:9: error: expected expression
  int x = ;
          ^
  1 error generated.
  > int x = 12;
  >

Parallel Programming Models

For parallel programming runtimes like MPI, we have the special concern that programs can be executed out of order. This results in lines of output being printed out of order, which is a big problem for reple, since it might print out a mix of stale lines and new lines instead of the output corresponding to the new line a user just entered. There's an easy fix for this, though, which is to force lines to be printed in order. reple allows this with a line epilogue, which is a special set of lines specified in the config file that will be appended to every repl line that the user enters. For MPI, we can specify a line epilogue that will force every line to print out in order.

  MPI_Barrier(MPI_COMM_WORLD);
  fflush(stdout);
  fflush(stderr);
  MPI_Barrier(MPI_COMM_WORLD);

This transparently enforces the ordering you would expect from a real interpreter: the user enters a line, then that line is executed and the output printed to the user. An alternate strategy might be to identify new lines by keeping around a set of all printed lines, then only printing out lines in new program output that do not appear in the set, but I haven't experimented with this yet. The line epilogue technique has the added benefit that it does not allow race conditions between REPL lines, which would be confusing, as it disobeys the implicit contract that REPL lines are executed in order.

Performance

Evaluating if performance tools are helpful for users or whether issues like REPL latency negatively impact usability is difficult. Ideally, we would run a double-blind study to determine whether reple is a helpful tool for users while programming and how often users encounter reple slowdowns in practice. Since I don't have the resources or experience to run a study like that, I've settled for building a collection of example REPL programs, which you can find in the /tests directory of the reple GitHub repo. Admittedly, the collection is currently pretty tiny, and I'd love to expand it. If you'd like to submit some REPL session examples or know where I might find a good database to harvest examples from, please submit a pull request or otherwise get in touch. The larger the collection of example programs, the better we can tune configuration files to get optimal REPL performance.

I ran all the example REPL sessions through reple on my 2015 MacBook Pro and found that the average latency of each example program was always less than half a second, which I view as small enough to be practical for live demos and individual exploration. After all, it will often take a human a few seconds to consider and type out the next line of code. This matches my experience in practice, which is that reple sessions tend to be pretty responsive for my REPL workloads, with output coming back at worst within a few seconds.

Limitations

While reple is a useful tool for many types of programs, there are some fundamental limitations to the technique. One obvious limitation is that users who want to use a REPL for long-running computations will find reple to be slow, since they'll replay the whole computation with each added line. It may be that some simple extensions to reple will fix this. For example, we could keep track of all program variables and, instead of inserting the original code at each iteration, only insert code that assigns variables to maintain state. While this might be simple enough to achieve for a particular programming language, it does require knowledge of the syntax and semantics of a specific language. At the very least, it requires that reple know how to recognize the declaration of a variable and how to assign a variable. I'm open to new ideas and suggestions about how this might be generalized.

Another problem is programs that output a nondeterministic number of lines, which confuse reple and can result in lines that output multiple times or not at all. It's possible this could also be resolved using the fix I mention for parallel programs above. Another possibility would be to somehow annotate each line so that its output is sent through a unique pipe, allowing us to uniquely identify the output generated by each line.

Perhaps the most serious limitation is lack of handling for runtime errors and exceptions. If the user introduces a seg fault to their REPL, this will usually result in a trashed REPL state, since the line that causes the seg fault will remain in the program until the user clears their history (and thus their state). Fixing this will require recognizing the return value received from running the REPL program and possibly catching and recognizing signals like SIGSEGV, printing an error for the user, and removing the most recent line from the history.

Footnotes

[1] It's also worth mentioning techniques like Viktor Kirilov's Interactive C++ Compilation Tool rcrl, which achieves true incremental execution without implementing a true interpreter. These techniques all require language-specific implementations, though.