CS 141 "Internet Technologies and Systems" Proposal
This is the working proposal for a new undergraduate-level class,
whose only prereqs would be CS 106 (AB or X), possibly 107, possibly
108, that introduces common Internet/Web technologies (HTTP, HTML, SSL,
email protocols, etc.) in context. Currently, a lot of
Internet-enthusiastic CS majors learn these things "by
osmosis" and with no context -- either practical, historical, or
theoretical -- and as a result end up spinning their wheels when
implementing nontrivial projects, or proposing designs that lack
sophistication. The real problem is that while the protocols and
technologies themselves are easy to pick up from a programming/hacking
standpoint, doing so does not give the student the benefit of seeing the
evolutionary design decisions (both good and bad) that influenced the
development of each technology. As a result, when implementing
anything nontrivial for which the "baseline" technologies
aren't a great fit, they have no starting point for determining what the
hard parts of the problem really are and whether past research might
yield some insight into solving them.
By the end of the course, students should:
- Possess technical familiarity, both conceptual and hands-on, with common Internet
technologies & software
- Be able to place each technology in a larger context: where did it come from? how
did it evolve? what are the pros and cons of using it for a
particular
task/application?
- Be able to identify how current projects that extend the Internet in various ways
(mobile/pervasive computing, distributed security, wireless access, etc.) map onto the
basic technologies -- what challenges not envisioned by the original designers of the
Internet are being addressed by such projects?
The starting point for the course will be Nick Parlante's CS 193I. The
course will be a logical prereq to CS 444I, Large-Scale Internet
Services, which will be renumbered CS 241.
Why is this course needed?
- From a scholarly perspective, the study of "the Internet" has substantial
components from Systems, Networking, and Applications (and
nontrivial contributions from
theory and programming languages). Currently these are all separate course areas and
most students don't get the chance to experience more than one or two of them in any
depth, and this only very late in their undergrad careers.
- From a practical perspective, many students already have Grand Visions for building
Internet systems only a couple of years into their undergrad careers. While we
should encourage them to channel their energy into good projects, we prefer for them to
embark on that journey with at least some context and knowledge for what they are getting
into. I've seen many proposals for novel applications but whose technical content is
weak/naive because they are not informed by significant technical background.
- Even if students get to take all the above separate courses, there is no course that
"ties together" the important components. For example, to understand the
pros and cons of even a simple protocol such as HTTP requires both systems and networking
background and an understanding of how the two interact. (In fact, until the recent
introduction of USITS, there wasn't even a good conference forum for discussing Internet
technologies and applications, as opposed to discussing purely systems or purely
networking or purely security.)
I believe this would be a first-of-its-kind class that would become widely emulated.
I hope to convince Mary Baker, Dan Boneh, and Dawson Engler to do this with me--if
we split it up 4 ways, the workload shouldn't be too bad.
Scope and purpose
Basically, divide the course by topic based on Internet technologies (see below for a
first cut). For each topic, we would present:
- Overview of what is involved, from a programmer's/application designer's point of view.
- Historical perspective: what is the larger body of work that this protocol, language,
etc. is germane to? How does it fit into that larger body? Why did this
technology, and not some other one, get adopted as the de facto standard? What are
the relevant academic fora in which this topic is regularly discussed (conferences, web
sites, etc.)?
- Pointers to research pages, seminars, colloquia, etc. at Stanford and elsewhere that
students can attend to get more depth on the topic
- Commercial perspective on the technology
- Programming-oriented assignment to get some hands-on experience; this would be a
mixture of using off-the-shelf Internet software components (Apache, etc) and writing
one's own code (sockets in C/Perl, Java applets, etc) to get insight into some specific
aspect of the technology (expose its limitations, contrast it with an alternative
technology to determine which is better for a specific task, etc.)
- [?] Occasional readings of some of the more accessible papers from the directly-relevant
literature.
Some fun "extras" for the course (depending on time constraints) would
include:
- Special/extended topics (I included a few ideas in the list below); either as
independent guest lectures, or integrated with the other material
- "Field trips" to some sites of interest; e.g. the machine room at Exodus,
which hosts various large Internet services
- Guest speakers as the opportunity and need arises
First cut at topic list
The following is my wishlist of what to cover, and specifically what
"contextual" stuff about each topic. Next task is to identify existing
overlaps between this and CS193I.
- HTTP: Transport/networking issues. stateful vs stateless
protocols, layered model of networks, binary vs ASCII; TCP, UDP, IP, multicast; physical
and link layer networking. Extensions: wireless, heterogeneous networks, various
flavors of mobile computing.
- HTML: media formats and data representation. SGML, other markup
languages, extensibility, XML. Overview of other formats; format transcoding;
encoding structure vs. layout. Hypertext and hyperlinking. Overview of other
"static" media formats (GIF, JPG, extensions). Architecture of a Web
browser client, extensibility, plug-ins.
- Email protocols. SMTP, POP, IMAP and email consistency, email
security.
- Java & JavaScript: Internet programming. Interpreted vs
compiled extension languages; mobile code pitfalls; language design issues/portability;
client vs. server execution; other extension technologies (Shockwave, ActiveX).
- Server architecture. Procs vs. threads, and architecture of a
typical Web server; intro to server workloads and performance issues; intro to replication
and load balancing strategies
- Dynamic content generation. Extending the server with CGI,
fast-CGI, and servlets; process models, scalability and state management concerns; session
synthesis using cookies and Fat URL's
- Security. SSL; public vs symmetric key cryptosystems; crypto
algorithm basics; key distribution problem; how certificates work and how secure channels
are bootstrapped; e-commerce protocols and challenges; attacking the protocol vs the
algorithm
- Internet evolution. How has the Internet evolved beyond its
original intent? What are some examples of how its radical departure from its
original operating point has stressed its protocols, software, infrastructure, etc.?
(Examples: how HTTP stresses TCP in ways it didn't expect; why firewalls are needed and
why it's tricky to get security right in an open network; TCP and similar protocols over
wireless; etc.)
- Other topics of interest. [Each of these can be, e.g., the
subject of a single-lecture overview that shows where they fit into the larger scheme of
things.] Web-based portals (putting a Web front-end on every service); multi-tier
architectures (caching, databases-on-Web, active proxies); extending the Internet
(wireless access, mobile computing and non-PC devices); streaming media challenges
fox@cs.stanford.edu