Steven A Hofmeyr, Stephanie
Forrest, Anil Somayaji
[PDF] |
Summary by
AF |
One-line summary:
Build up a profile of "normal" fixed-length seqs of
syscalls done by Unix privileged programs (lpr, sendmail, etc); detect
anomalies as the number of sequences seen during a run that are not in
the "normal" database. Good results on sendmail, lpr,
others (with synthetic DB creation and some live workloads) with seq
length=10; some voodoo in how the seq length is chosen and how the
anomalous thresholds are chosen.
Overview/Main Points
- Idea: capture temporal sequences (not args or anything else) of
system calls, of length k. e.g. if k=3, then the
sequence open, mmap, mmap, read, write, close will generate 4
such sequences. Call these the "normal" sequences
and the database that stores them the normalDB. It's stored
as a forest where each tree is rooted at a systemcall; you can
then lookup whether a given sequence is in the database by doing 1
hashtable lookup (initial systemcall in seq) plus at most k-1
comparisons.
- For each sequence seen at runtime, determine its
"pseudo-Hamming-distance" from the normal sequence most
similar to it. Not a true Hamming distance since sequences
aren't binary strings, and there's no inherent measure of
"distance" that makes sense since there's no metric for
how "close" one syscall is to another, but it's a crude
approximation. "Since we can't defend it theoretically,
we evaluate its effectiveness empirically"
- If an observed seq is not in the normalDB, it will take a
long time to determine which sequence is "closest" (since
all have to be examined); but since anomalies are expected to be
rare, the common case of verifying a legal (non-anomalous) seq is
fast. "Anomaly signal" is defined as the max of the
minimal-Hamming-distances of all anomalous sequences in a
trace. This is divided by k so that the
"signal" can be compared across runs with different values
of k.
- Eval metrics: false negs & false pos. False negs are
preferable, since they can be reduced by adding layers of defense,
whereas false positives may be compounded by additional
layers.
- Evaluation: first created synthetic traces by running sendmail,
lpr, other progs; then tested these against known intrusion attacks
as well as in "live" environments with no intrusions (to
analyze false positives). They varied the program's operating
parmaetres (email message length, etc) and picked the workload(s)
that generated the highest variation in number of distinct syscall
sequences. They experimented with different values of k and
picked k=10 "based on empirical observation".
- (If k too small, can't reliably discriminate
anomalous from non-anomlaous; if too large, computation too slow
since time complexity of detection is linear with k and with the
number of sequences, which is exponential in k in theory but not
in practice)
- If k too large, besides complexity, we have the problem that
sequences all start looking the same, because if an intrusion is
"contained" as a subsequence, longer sequences become
more similar to normal sequences.
- Although number of possible sequences of length k is
exponential, the number of observed sequences during
"normal" program runs was 10-13 percent of
that number!
- One issue: can we determine behavior of one program from that of
another? Trial-and-error showed that k=10 is enough to
distinguish sendmail from other test programs (ie with k=10,
sendmail's behavior "appears 100% anomalous" when run
against another program's normalDB).
- Their approach detected "anomalies" that correspond to
buggy behavior, ie lpr unable to print a file because it's out of
disk space! So this would suggest that the technique can be
used for bug detection as well. (They argue these aren't true
false positives, since you want to detect such behaviors as
"abnormal" even if not intrusions)
Relevance
- Challenge: to apply similar techniques to other problems, have to
first determine whether a simple definition (analogous to seq of
syscalls) gives a clear and compact "signature" of normal
behavior.
- Takes the position that system behavior is a complex property that
we must understand through observation; just because we can design a
system doesn't mean we can predict its behavior.
- Confirms empirically that for this approach, the number of
"observed" ("normal") sequences of syscalls is
much less than the possible permutations, which is a key property
that makes these approaches work. Pinpoint uses this assumption as
well (number of "legal" codepaths thru EJB's is <<
number of possible permutations of EJB sequences).
Flaws
|