WOSS'04 workshop notes

Our talk on the RADS position (SLT + cheap recovery, with an allusion at the end to CT) was well received.  (slides, paper)

In fact, we should consider ICSE (Intl Conf on SW Eng), ASE (Automated SW Eng) and FSE (Foundations of SW Eng) as possible venues for ROC/RADS papers in the future.  I'm adding the info to the SWIG conferences page.

This ws was aimed at software engineering researchers; a lot of papers were about abstract software modeling with UML and expressing "architectural adaptation" (inserting a component into a SW pipeline or DAG, eg) in terms of such models.

Control-based framework for optimizing power consumption vs performance in web server tier... - control loop to optimize a cost function that includes both a penalty for violating SLA-level response time and an energy cost (servers allow dynamic voltage scaling to trade performance for power).  (100-server datacenter consume ~5MW of power!)  Specifically, minimize the weighted combination subject to constraints on arrival rate.

Controllable vars: # servers, CPU freq of each.  Challenge: "dead time" of switching on a computer is long, so  Use model predictive or receding horizon control.  (As opposed to feedback control, since dead time is long!)  This seems to be an intermediate point between feedforward and traditional feedback models.  You use a behavioral model to estimate future system state over a "prediction horizon" of length n, in response to sequence of n control inputs (n timesteps into the future) ;  then you pick the sequence that is predicted to give best accuracy, throw out the rest, apply those actual inputs, and continue.  Basically it is a tree search of the state space.  In this case, the lookahead has to be at least as long as the time it takes to turn a new machine on.

Has been simulated but not prototyped on a real system..

Property of this controller is feasibility (vs stability) - does it reach the desired operating region and keep the system there.  Not sure how this differs from stability.

This is a traditional Continuous time formulation, and something about using receding-horizon technique makes it possible to solve the discrete-time optimization (?)

Recovering from kernel-level compromises/rootkits.  Sandy Ring, Sytex Inc (a gov't contractor, kind of like SAIC; algo's described are patent pending).  Detection is hardest part: rootkit attacks can conceal malicious processes, files, network connections, by mucking around with kernel level data structures (similar to what Yi-Min told us about for Windows spyware).

Basis of approach: if user-space and kernel-space view of the world don't match, something is wrong.  This is similar to Yi-Min's strategy of booting Windows from trusted (eg CDROM) volume so you can do "real" directory listings, Registry audits, etc.  Their tool does detection, forensics (collecting the evidence), and recovery (fixing up the badnesses to close the holes introduced by intruders).  Cannot recover from deleted/modified files, compromised data, logged keystrokes.

Works after-the-fact: you don't need to "baseline" system first, and it doesn't have to witness the attack in progress.  Implemented as loadable kernel module for Linux.  First, look at syscall table and repair any bad entries.  (Kernel maintains a kAllSyms data structure with the known-good addresses; current rootkits don't patch that, but now that this ppr is published, they probably will.  You could also checksum the table right after boot, if you know boot device is trusted, or look at the kernel image on disk, if you're sure it hasn't been compromised.)  Second, look for "hidden" procs and kill them (including PID 0, which some hackers use to hide from getdents()).  Third, look for "hidden" files (compare ls -R in both domains; takes about 15 mins for 80Gig disk) and delete them.  Last, look for attack traffic (using hidden sockets) and close the sockets; Linux lets you add "packet handlers" to the stack, so they add one that looks for packets destined for processes that don't appear in the normal process table.

Most rootkit hackers modify stuff in /etc/init to load a kernel module that patches syscalls, etc.  (I guess they won't after this!)

Architectural differencing: compare system behavior with a simulated behavioral model.  If different, system may be hosed.  This is weird because it assumes you can start with a system and somehow build a model that you can then trust as being reprsentative of the system (in the sense that if the model does X in response to input Y, system should do the same).  I and others asked about this assumption, and the response was "that's a problem of modeling [as opposed to a problem of this approach]".  A number of people jumped on this ("The model is not reality", "There's not necessarily any correspondence between an 'abstract state' of the model and an 'implementation state' of the system", etc.)

David Garlan et al, task-based self-adaptation.  "Proper adaptation decision [if there are mutliple choices] requires knowledge of user's intent."  (tie-in to human operator?)  Task-based approach could be useful for operator since we want to focus on what operator wants to do (then present possible ways system can help).

There is interest in using SLT-based models for understanding as well as for guiding recovery.  In particular, this commiunity has been doing "top-down" modeling for years, and they are interested in what kinds of models you get from "bottom-up" modeling and how they differ from top-down models.  (This is a way of saying that if humans have info about the system, they should use it, despite the ability of SLT to extract useful info by itself.  The question is how the human's info, which is usually based on high-level models and states, can map down to the implementation.)  I'm trying to get them to think about applying their techniques to existing frameworks like J2EE (all this modeling work seems to use UML as a modeling language, then they provide a "framework" or "development environment" or "runtime platform" for you to build/run the apps; all of these are "open platforms", and most have no users).

The SW modeling community also has trouble persuading companies to adopt "modeling based" methodologies, because they can't convince them of the payoff.  (The model can be used for reasoning about the system, for maintenance, ...)  One can imagine how models like pipe-filter ones could be useful in reasoning about dependability (e.g. whether a component is safe to replicate or not).  I suggested that maybe SLT could be used to extract the models (or help do so) from existing SW artifacts.  THen the work of creating the models would be reduced, making it less critical to identify a large payoff entirely in advnace.  David Garlan was interested in this line of reasoning and they have been teaching a "software modeling 101" class that is struggling to figure out what parts are useful to teach for practitioners.  I'll talk to Jeanette Wing about this on Thursday.

Network configuration management via model finding, Sanjai Narain, Telcordia.  He is a logician by training but now works in networking.  Large cx distrib systems are created via integration and configuration; each component has a number of config parameters.  Difficulty of hand configuration is the major contributor to the cost of networking infrastructure development and maintenance.  eg, one client has 240 sites on a dedicated network that is being migated to VPN-over-Internet; this will take 3 years because only  a small team can do it ("too many cooks").  He has developed a language called Alloy for expressing requirements that lets a constraint-solver (first order logic, constraint satisfaction) propose configurations, but his talk made it impossible to undersatnd what has actually been done (and currently it appears to be tested only on toy problems).

People really liked the "probabilistic view of correctness" flavor of this work (esp. since they come from a "formal correctness" background).  THey also liked the practical approach to evaluation (I was somewhat confused as to how software modeling work is evaluated, since you can only evaluate a framework in terms of the success of things built with it).