CS261 Scribe Notes, 9/16/08
 Matthias Vallentin <vallentin@berkeley.edu>

============
 Sandboxing
============

motivations for sandboxing
* browser plug-ins
  - restrictions on plug-ins
  => prevent plug-in from corrupting browser data structures
* untrusted code execution
* BPF filter expression 
  - installed in kernel
  - dangerous but efficient

"reference monitor"
 - access control checker that acts on behalf of the untrusted code
 - stateful
 - policy enforcement in the reference monitor (policy <=> mechanism)
 => decomposition allows 
    1) sandboxing
    2) invocation of ref-mon entry points
 - advantages over separate procceses & IPC
   . efficiency (sys-calls, context-switches, etc.)
   . prevent untrusted code from accessing sys-calls


=====
 SFI
=====

0x00....0  ----------------------------------
                   unused space
0x10....0  ----------------------------------
                   sandboxed code
0x20....0  ----------------------------------
           ---------------------------------- guard page
                   sandboxed data
0x30....0  ----------------------------------
           ---------------------------------- guard page

                   unrestricted 
                     code/data

0xFF....F  ----------------------------------


Rewriting 
1) load instruction:
   mov  %eax, (%ebx)
          |
          v
   and 0x2FF..F, %ebx    ; if the address is valid, %ebx remains valid,
                         ; if it is invalid, it will point to the "unused code"
                         ; region
   mov  %eax, (%ebx)
2) write instruction: not rewritten
3) branch instruction
   * direct: jmp 0x12345678     ; check statically
   * indirect: jmp (%ebx)
                 |
                 v
               and 0x1FFF..F0, %ebx
               jmp (%ebx)

SFI issues:
* code could not access the reference monitor
  -> provide static ref-mon entry point in unrestricted code/data
* how to secure the ref-mon? 
  -> save environment (almost like a context switch / mini-os)
* enable communication between multiple sandboxes?
  -> indirection through the ref-mon 

=> performance: trading off when expensive checks are performed
   - OS: MMU handles memory access
   - SFI: checks each memory access to prevent expensive cross-domain
   	 communication.


for (i = 0; i < n; i++)
{
    *p++ = 5;
}
        |
        |
        v

for (i = 0; i < n; i++)
{
    p &= 0x2FF..F;
    *p++ = 5;
}
        |
        | compiler optimization?
        v

p &= 0x2F...F
if (p >= 0x30...0 - n) error();
for (i = 0; 0 < n; i++) 
{
    *p++ = 5;
}
    
at the assembly level, multiple BBs:

               +----------
               | p &= ...
               | if (p >= ...
               +---------- ------------------> error()
                    |
                    v
               +----------
         +->   | if (i < n)
         |     +----------
         |          |
         |          v
         |     +----------
         |     | *p+ = 5
         |     | i++
         |     +----------
         |           |
         +-----------+


This control flow graph is not inferable from the binary.
=> problem: indirect jump can go everywhere!
=> SFI has limitations that affect performance: they don't know about the
   control flow and cannot perform optimizations.

Idea behind SFI: 
- using static analysis to prove a program is safe is way to hard.
- thus identify an "easy subset" of x86, mapping the entire x86 space to the
  easy subset using rewriting.
- lastly, statically verify the rewritten code
- if it is not obvious safe, reject
-> design problem: find an easy subset that enables a useful verifier

Conclusion: reduce the size of trusted computing base