CS294-91 Distributed Computing
CCN: 27309
Instructor: Professor Scott Shenker (shenker@icsi)
Guest-lecturer: Ali Ghodsi (alig@cs)
W 10:30-12:00pm 405 Soda Hall (Starting 30 January 2013)
In the past decade evermore applications and services, which
previously were running on local PCs, have moved to the Internet, in
data centers, accessible through the Web. This puts distributed systems
at the center of many of s application architectures. Distributed
systems (or distributed computing) concerns systems in which many nodes
(machines) solve a common problem, using message passing over a network
that connects those nodes. The aim of this course is to establish
familiarity with the basic theoretical and practical foundations of
distributed systems.
Distributed computing is challenging due to two fundamental
problems: (i) partial-failures, and (ii) asynchrony. Partial failures
means that parts of the system (network or machines) can be faulty, but
it is desirable for the rest of the system to function correctly.
Asynchrony is due to the variance in the time it takes to send messages
between computers and the operating speed of different computers. It is
therefore desirable to make the system function correctly while events
are happening asynchronously.
Over the years, many recurring problems have been studied with
respect to the two aforementioned challenges. Furthermore, many
abstractions have been proposed that simplify dealing with these two
challenges when building distributed systems. In this course we will
study many of these problems and abstractions, including the following:
today
- Models of distributed systems
- Safety and liveness of distributed protocols
- Different failure models for distributed systems (fail-stop,
fail-noisy, Byzantine)
- Reliable group communication abstractions (reliable, atomic,
etc)
- Shared memory and consistency models (linearizable, regular
etc)
- Failure detectors and their relationship and implementation in
real systems
- Impossibility of Consensus
- Consensus and Paxos
- Replicated State Machines and Reconfiguration
- Byzantine Fault Tolerance
The class is 2 credits and will consist of one lecture/seminar per
week. It also includes each student presenting in class one research
paper, related to distributed computing, and handing in a two page
summary of the papers. Classes are 1.5 hours long and are scheduled
every Wednesday 10:30 in Soda Hall 405. The course starts on Wednesday
the 30th of January.
Grading
2/3 Research paper summary
1/3 Seminar Participation
Homework
Reading list of papers [link]
Instructions for homework [link]
Course Textbook
We will loosely follow the following textbook, but also have additional lectures based on research papers:
Introduction to Reliable and Secure Distributed Programming, C. Cachin, R. Guerraoui, L. Rodrigues, Springer, 2011.
Lectures
|