The Case for Application Specific Benchmarking

Intrusion detection using sequences of system calls

Steven A Hofmeyr, Stephanie Forrest, Anil Somayaji
[PDF]

Summary by
AF

One-line summary:

Build up a profile of "normal" fixed-length seqs of syscalls done by Unix privileged programs (lpr, sendmail, etc); detect anomalies as the number of sequences seen during a run that are not in the "normal" database. Good results on sendmail, lpr, others (with synthetic DB creation and some live workloads) with seq length=10; some voodoo in how the seq length is chosen and how the anomalous thresholds are chosen.

Overview/Main Points

Idea: capture temporal sequences (not args or anything else) of system calls, of length k. e.g. if k=3, then the sequence open, mmap, mmap, read, write, close will generate 4 such sequences. Call these the "normal" sequences and the database that stores them the normalDB. It's stored as a forest where each tree is rooted at a systemcall; you can then lookup whether a given sequence is in the database by doing 1 hashtable lookup (initial systemcall in seq) plus at most k-1 comparisons.

For each sequence seen at runtime, determine its "pseudo-Hamming-distance" from the normal sequence most similar to it. Not a true Hamming distance since sequences aren't binary strings, and there's no inherent measure of "distance" that makes sense since there's no metric for how "close" one syscall is to another, but it's a crude approximation. "Since we can't defend it theoretically, we evaluate its effectiveness empirically"

If an observed seq is not in the normalDB, it will take a long time to determine which sequence is "closest" (since all have to be examined); but since anomalies are expected to be rare, the common case of verifying a legal (non-anomalous) seq is fast. "Anomaly signal" is defined as the max of the minimal-Hamming-distances of all anomalous sequences in a trace. This is divided by k so that the "signal" can be compared across runs with different values of k.

Eval metrics: false negs & false pos. False negs are preferable, since they can be reduced by adding layers of defense, whereas false positives may be compounded by additional layers.

Evaluation: first created synthetic traces by running sendmail, lpr, other progs; then tested these against known intrusion attacks as well as in "live" environments with no intrusions (to analyze false positives). They varied the program's operating parmaetres (email message length, etc) and picked the workload(s) that generated the highest variation in number of distinct syscall sequences. They experimented with different values of k and picked k=10 "based on empirical observation".

(If k too small, can't reliably discriminate anomalous from non-anomlaous; if too large, computation too slow since time complexity of detection is linear with k and with the number of sequences, which is exponential in k in theory but not in practice)
If k too large, besides complexity, we have the problem that sequences all start looking the same, because if an intrusion is "contained" as a subsequence, longer sequences become more similar to normal sequences.
Although number of possible sequences of length k is exponential, the number of observed sequences during "normal" program runs was 10^-13 percent of that number!

One issue: can we determine behavior of one program from that of another? Trial-and-error showed that k=10 is enough to distinguish sendmail from other test programs (ie with k=10, sendmail's behavior "appears 100% anomalous" when run against another program's normalDB).

Their approach detected "anomalies" that correspond to buggy behavior, ie lpr unable to print a file because it's out of disk space! So this would suggest that the technique can be used for bug detection as well. (They argue these aren't true false positives, since you want to detect such behaviors as "abnormal" even if not intrusions)

Relevance

Challenge: to apply similar techniques to other problems, have to first determine whether a simple definition (analogous to seq of syscalls) gives a clear and compact "signature" of normal behavior.
Takes the position that system behavior is a complex property that we must understand through observation; just because we can design a system doesn't mean we can predict its behavior.
Confirms empirically that for this approach, the number of "observed" ("normal") sequences of syscalls is much less than the possible permutations, which is a key property that makes these approaches work. Pinpoint uses this assumption as well (number of "legal" codepaths thru EJB's is << number of possible permutations of EJB sequences).

Flaws

Back to index

Summaries may be used for non-commercial purposes only, provided the summary's author and origin are acknowledged. For all other uses, please contact us.