The mobile code problem

Example implementation:
(picture of secure "work" machine + insecure "play" machine (for mobile code)
  + two connections to internet)
(if you want to use your "work" machine's modem line connection to the net
  to download the dancing dinosaurs, you can install a one-way connection
  from work->play with a "data diode", but you have to be _very_ careful
  not to let sensitive data out across the data diode.  not recommended)
(ask: how to implement a diode?)
(but now mobile code can attack other mobile code; no protection between
  applets.  solution: add a secure "Reset" button to the "play" machine
  which erases all state, wipes hard disk, re-installs, OS, and brings it
  back to the factory defaults.  works as long as you don't want to run
  two mobile applets concurrently)
(advantages: secure.  disadvantages: expensive.)
(but you don't want to have to buy _two_ 21" monitors, keyboards, mice, etc.,
  so you want to use "work" machine as a display for "play" machine.
  solution: stick a firewall betw. work and play that only allows window
  bitmaps in play->work direction and keypress/mouse events in work->play dir.)
(ask class for problems)
  (keypresses inadvertently going to play machine,
   mobile code spoofing trusted windows)
(defenses: *be sure to give user indication of when the keys he type are going
  to the play machine, and which windows are coming from the play machine*)
(but now you want to amortize cost of "play" CPU across many users.
  so you make it a server that anyone can use.  stick it on the DMZ
  (not in your intranet).)
Ok, so that gave you some flavor of the problem.

Alternative approach:
- Find a software solution, so you don't need two different machines
  - Download code, verify it won't do anything dangerous, then execute it safely

Goals:
- Isolation
  - Integrity, confidentiality, availability (?) of local machine:
  - Local data & code should not be affected
  - aka "Sandboxing"
    - Confine untrusted applet to a `sandbox', within which it can do whatever
      it wants, but it can't escape -- so applet bugs can cause itself to get
      harmed, but can't cause rest of system to be harmed
    - In combination with safe languages, protects against malice
    - Alone, you may get the situation that undefined behaviour may lead to
        the killing of the job
- Controlled sharing
  - Partial access to non-security-critical parts of machine
    (screen real estate, keyboard, temporary storage etc.)
(last category is slightly non-trivial; ask class for ideas)
- ``Good neighborliness''
  - Don't let this be used as a way to spread virii, stepping stones
  - If others trust this machine, don't let mobile code
    "abuse the good name of the local machine":
    e.g., address-based authentication (rlogin), firewalls



Overview of mobile code paradigms:
  * Distinction between safety and security: safety is protecting the
    program against itself, security means protecting against malice
  o Trust
    o "YOYO" You're On Your Own (mash-tcl)
      * Are You Out Of Your F* Mind?!
      * Main problem is how recipient can verify he should trust the code
        (assuming you don't trust everything)
    o Shrinkwrapped code (...)
    o Code signing (ActiveX, SPIN)
      * (SPIN used trusted compiler with safe language; compiler signed code;
        kernel accepted only code signed by the trusted compiler)
      * Have different sandbox policies depending on who wrote the code
      * beware that a signature on a piece of code indicates only accountability
        not authorship (because a third party can always strip the signature off
        of some signed code and sign it themselves)
      * Eric's "parking lot" analogy
        "It is analogous to saying that         
        license plates prevent accidents or
        malicious damage," said Eric Brewer,    
        a computer science professor at U.C.    
        Berkeley. "Just as you can find your    
        car dented in the parking lot, you      
        may find your machine or privacy        
        damaged after the fact without being    
        able to tie it to a particular          
        control, certified or not."             
      * It doesn't scale well...
  o Safe languages (Java, BPF, Modula-2, agent-tcl, taintperl, Penguin)
    o Interpreted languages vs. compiled languages
    * eliminate "undefined" behaviours in the language
      -  a = (a++) + (++a)
      -  f += 4; (*f)();
      -  ((char *)f)[17] = 0xE7; (*f)();
    * Make sure the code, when executed, does not do anything "unintended"
    * Protects mostly against mistakes (coding errors)
    * _very_ limited languages (BPF) can be made quite safe this way,
      but there's not much you can do in them
    * Typically either interpreted or run in a "virtual machine";
      otherwise, you have to trust the compiler
    * Typically used in conjunction with runtime checking
  o Code verification (Java/JVML, BPF, ASH)
    * Where you prove that specific programs won't do bad things
    * Almost impossible with general code, typically want to only verify
      code that's been designed for verification (either written in a
      language designed for verification, or output by a compiler designed
      to make it easy to verify its output)
    * Some results having to do with having the compiler insert invariants
      that can be easily checked ("Proof-carrying code")
    * With safe languages, much easier
    * If the code is compiled into a "non-safe" form, you probably _need_
      to do this
    * ASH = event handlers you can download into the kernel; like
      assembly language; kernel checks to make sure they have bounded
      runtime and don't access memory they're not supposed to;
      like BPF, but used for more than just packet sniffing (e.g.
      general user-controlled de-multiplexing in Exokernel)
  o OS-enforced sandboxes (chroot, Janus, etc.)
    * Might be implemented with an integrated reference monitor in the OS
      (chroot)
    * Might be implemented by an extension to the OS,
      e.g., syscall interposition (Janus)
  o Virtual machines (VMware)
  o Physical isolation (separate machines)



Talk about BPF



Sandboxing legacy software

Basic model:
- You start with some software, and you want to subject it to a security policy
- So, pick a set of security-relevant events, and design a
  security automata (reference monitor) that checks these events
  and enforces the security policy by restricting them
- So policy can be specified as a "language": a set of traces
  (over the alphabet of events)
- Q: What kinds of properties can be enforced in this way?
  _Safety_ properties: a language L \subset E^* is a safety property if
     t \not\in L, u \in E^* => t.u \not\in L
    
Ways of getting access to events:
- Passive observation
- Active interposition
- Code transformation (modifying the application program)
- Q for class: Give examples of the above for network security
  (packet filters, firewalls, no analog to code transformation)

Janus:
- (Active) interposition
- Goal: security, not correctness
  (app can be broken, but can't hurt anything as a result)
- Use debug facility (ptrace()) to intercept syscalls
  - Check arguments, squelch some syscalls accordingly
    (even modify args and return values when necessary)
  - Recursively contain all fork()ed child processes
  - App placed in a subtree: full access within
    (can send signals to parent app and its children),
    but can't get out
  - Optimizations: Check only open(), since read(),write()
    can only use fd's obtained from open(); thus read(),write()
    go at full speed
- Configurable access to filesystem, network, IPC, etc.
  - _Default_ is to deny all access
  - Then you specify things which are allowed
  - Much like specifying a firewall policy
  - Network access only to an X proxy, DNS proxy, etc.
  - Nested X system in a window, to prevent using X to hack other apps
- Note analogy to firewalls, and to active reference monitors


SFI:
- Code transformation
- Inject security monitor, inlined directly into the application
  - Options: can inject it early (by hand; or compiler can do it),
    combined with a verification just before it is run; or, can inject
    it very late in the game (then no need for verification)
  - Q: Advantages, disadvantages of each?
- SFI policy: Fault domains are subsets of an address space;
  code is confined to a single fault domain (e.g., can only write to
  memory in your own fault domain)
- Enforcement
  - Simple idea: Bounds-check each load/store address
    (copy to dedicated register + two comparisons + two conditional traps)
  - Q: Why do you need dedicated register?
  - Optimization: Fault domains live on power-of-two address spaces;
    upper bits of address specifies fault domain
    (copy + shift + comparison + conditional trap)
  - Optimization: Rather than testing to make sure upper bits of address
    are valid, just set them to the desired value (2 logical ops)
    - Changes security guarantee: Out-of-bounds pointers will now corrupt
      your own data, but won't hurt anyone else
  - Optimization: Guard zones before and after each segment;
    guard zones are unmapped and thus accesses to them will fault
  - Optimization: Treat stack pointer specially (usually its changed
    only by a small amount, larger or smaller, and thus is safe as
    long as you have large enough guard zones)
- RPC
  - All calls across domains require to copy arguments
    (or use VM to fake out copying)
  - Syscalls to OS done by RPC to a arbitrer fault domain
    (arbitrer looks very similar to Janus; interposition)
  - Shared data pages can be mapped into multiple fault domains

Generalizes to arbitrary injection of enforcement mechanism

Q: Who injects the enforcement mechanism?
- One possibility: Code consumer (e.g., web client) does the injection;
  but then work might be repeated many times, and also might consumer
  has no access to types and other high-level information about code
- Another possibility: Code producer (e.g., compiler) does injection;
  but then the consumer must verify that injection was done properly

Proof-carrying code
- Code producer injects enforcement mechanism, and then supplies
  a proof to consumer that the result satisfies the consumer's policy
      Consumer->Producer: policy
      Producer->Consumer: safe code, proof

Policies
- control flow safety
- memory safety
- stack safety
- type safety
* Q: Are these sufficient for mobile code security?



Safe languages

Reminders on Java
   -- applets and applications
   -- object-oriented, garbage-collected, type-safe language
   -- no pointers!!
   -- classes, objects, packages
   -- threads
   -- native code
   -- type-safety
   -- all casts are explicit
   -- private, protected, package scope, public, synchronized,
      final (methods & classes), try{}finally{}
   -- subclassing, interfaces, rules for constructors

Java bytecode (JVML)
-- a little different from the Java source language
-- expansion of syntactic sugar
-- opcodes for an imaginary ("virtual") machine with a stack & some registers
-- subroutines for try/finally and nested exception handlers

Architecture for type-safety
-- Compiler totally untrusted
-- A mixture of static and dynamic type-checking
-- TCB: Verifier, JVM, SecurityManager, ClassLoader, libraries
-- Bytecode verifier
   -- simple checks
   -- dataflow analysis verifies:
      -- at any given point in code,
            -- stack is always same size & has same type
            -- registers not accessed until loaded with correct type
            -- methods typecheck
            -- field accesses typecheck
            -- opcodes have appropriate type arguments on the stack
         -- instructions are used atomically (i.e. control flow instructions
            point to the start of an instruction)
         -- at any "join" in control-flow, merge the type of the world
            (if possible; otherwise fail)
            -- note that at any instruction, every registered exception
               handler is a possible successor (except that you push an
               exception object on the stack before the jump to the
               exception handler)
            -- i.e. registers are not polymorphic
         -- e.g. the following code will fail, due to static type-checking:
               Object obj;
         if (flag == 0) obj = new File(); else obj = new String();
            if (flag == 0) println(obj.getPath()); else println(obj);
         -- Q: would the above technique have worked for x86 assembly?
        A: no: variable-length instructions, registers are untyped,
               stack discipline not followed, untyped load/stores
-- JVM
   -- interpreter for JVML
   -- Dynamic type-checking on first access to every class, method, field
   -- array bounds-checking and storage type-safety
      (give an example of why the latter has to be dynamically: covariance)
   -- Garbage collector has to be secure
      -- if it deallocates memory while there's still a pointer to it,
         can subvert type system
      -- if it doesn't zero out memory, you have object re-use problems
-- ClassLoader
   -- dynamic loading of classes

The remainder of the security architecture: how to use type-safety
-- SecurityManager
-- Libraries