CS262a: System R & DBMS Overview

Advanced Topics in Computer Systems	Lecture 2
Edited by Eric Brewer based on notes by Joe Hellerstein

System R & DBMS Overview

DBMS History

late 60's: network (CODASYL) & hierarchical (IMS) DBMS. Charles Bachman: father of CODASYL predecessor IDS (at GE in early 1960's). Turing award #8 (1973, between Dijkstra and Knuth.)
- IMS Example: Suppliers record type and Parts record type. One is parent, one is child. Problems include redundancy and requirement of having a parent (deletion anomalies.)
- Low-level ``record-at-a-time'' data manipulation language (DML), i.e. physical data structures reflected in DML (no data independence).
1970: Codd's paper. The most influential paper in DB research. Set-at-a-time DML with the key idea of "data independence". Allows for schema and physical storage structures to change under the covers. Papadimitriou: "as clear a paradigm shift as we can hope to find in computer science"). Edgar F. Codd: Turing award #18 (1981, between Hoare and Cook).
- Data Independence, both logical and physical.
- What physical tricks could you play under the covers? Think about modern HW!
- "Hellerstein's Inequality":
  - Need data independence when dapp/dt << denvironment/dt
  - Other scenarios where this holds?
  - This is an early, powerful instance of two themes: levels of indirection and adaptivity
mid 70's: wholesale adoption of Codd's vision in 2 full-function (sort of) prototypes. Ancestors of essentially all today's commercial systems
- Ingres : UCB 1974-77
  - a ``pickup team'', including Stonebraker & Wong. early and pioneering. Begat Ingres Corp (CA), CA-Universe, Britton-Lee, Sybase, MS SQL Server, Wang's PACE, Tandem Non-Stop SQL.
- System R : IBM San Jose (now Almaden)
  - 15 PhDs. Begat IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem Non-Stop SQL. System R arguably got more stuff ``right'', though there was lots of information passing between both groups
  - Jim Gray: Turing Award #22 (1998, between Englebart and Brooks)
  - Lots of Berkeley folks on the System R team, including Gray (1st CS PhD @ Berkeley), Bruce Lindsay, Irv Traiger, Paul McJones, Mike Blasgen, Mario Schkolnick, Bob Selinger , Bob Yost. See http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html#Index71.
- Both were viable starting points, proved practicality of relational approach. Direct example of theory -> practice!
  - ACM Software Systems award #6 shared by both
  - Stated goal of both systems was to take Codd's theory and turn it into a workable system as fast as CODASYL but much easier to use and maintain
  - Interestingly, Stonebraker received ACM SIGMOD Innovations Award #1 (1991), Gray #2 (1992), whereas Gray got the Turing first.
early 80's: commercialization of relational systems
- Ellison's Oracle beats IBM to market by reading white papers.
- IBM releases multiple RDBMSs, settles down to DB2. Gray (System R), Jerry Held (Ingres) and others join Tandem (Non-Stop SQL), Kapali Eswaran starts EsVal, which begets HP Allbase and Cullinet
- Relational Technology Inc (Ingres Corp), Britton-Lee/Sybase, Wang PACE grow out of Ingres group
- CA releases CA-Universe, a commercialization of Ingres
- Informix started by Cal alum Roger Sippl (no pedigree to research).
- Teradata started by some Cal Tech alums, based on proprietary networking technology (no pedigree to software research, though see parallel DBMS discussion later in semester!)
mid 80's: SQL becomes "intergalactic standard''.
- DB2 becomes IBM's flagship product.
- IMS "sunseted''
today: network & hierarchical are legacy systems (though commonly in use!)
- IMS still widely used in banking, airline reservations, etc. A cash cow for IBM
- Relational commoditized -- Microsoft, Oracle and IBM fighting over bulk of market. NCR Teradata, Sybase, HP Nonstop and a few others vying to survive on the fringes. OpenSource coming of age, including MySQL, PostgreSQL, Ingres (reborn). BerkeleyDB is an embedded transactional store that is widely used as well, but now owned by Oracle.
- XML and object-oriented features have pervaded the relational products as both interfaces and data types, further complicating the "purity" of Codd's vision.

Database View of Applications

Big, complex record-keeping applications like SAP and PeopleSoft, which run over a DBMS. "Enterprise applications" to keep businesses humming. A smattering:

ERP: Enterprise Resource Planning (SAP, Baan, PeopleSoft, Oracle, IBM, etc.)
CRM: Customer Relationship Management (E.phiphany, Siebel, Oracle, IBM, etc.)
SCM: Supply Chain Management (Trilogy, i2, Oracle, IBM, etc.)
Human Resources, Direct Marketing, Call Center, Sales Force Automation, Help Desk, Catalog Management, etc.

Typically client-server (a Sybase "invention") with a form-based API. Focus on resource management secondary to focus on data management.

Traditionally, a main job of a DBMS is to make these kinds of apps easy to write

Relational System Architecture

See the article in Foundations and Trends in Databases for in-depth disussion of many of the issues below. Databases are BIG pieces of software. Typically somewhat hard to modularize. Lots of system design decisions at the macro and micro scale. We will focus mostly on micro decisions -- and hence ideas reusable outside DBMSs -- in subsequent lectures. Here we focus on macro design.

Disk management choices:

file per relation
big file in file system
raw device

Process Model:

process per user
server
multi-server

Hardware Model:

shared nothing
shared memory
shared disk

Basic modules:

parser
query rewrite
optimizer
query executor
access methods
buffer manager
lock manager
log/recovery manager

Notes on System R

See the System R reunion notes for fun background and gossip.

Some "systems chestnuts" seen in this paper:

Expect to throw out the 1st version of the system
Expose internals via standard external interfaces whenever possible (e.g. catalogs as tables, the /proc filesystem, etc.)
Optimize the fast path
Interpretation vs. compilation vs. intermediate "opcode" representations
Component failure as a common case to consider
Problems arising from interactions between replicated functionality (in this case, scheduling)

Some important points of discussion

Flexibility of storage mechanisms: domains/inversions vs. heap-files/indexes. Use of TID-lists common in modern DBMS. Why be doctrinaire? What about Data Independence? One answer: you have to get transactions right for each "access method".
System R was often CPU bound (though that's a coarse-grained assertion -- really means NOT disk-bound). This is common today in well-provisioned DBMSs as well. Why?
DBMSs are not monolithic designs, really. The RSS stuff does intertwine locking and logging into disk access, indexing and buffer management. But RDS/RSS boundary is clean, and RDS is decomposable.
Access control via views: a deep application of data independence?!
Transactional contribution of System R (both conceptual and implementation) as important as relational model, and in fact should be decoupled from relational model.

The "Convoy Problem":

A classic cross-level scheduling interaction. We will see this again!
Poorly explained in the paper.

I have always found this presentation confusing. A number of issues are going on. The first two have to do with interactions between OS and DB scheduling:
1. the OS can preempt a database "process" even when that process is holding a high-traffic DB lock
2. DB processes sitting in DB lock queues use up their OS scheduling quanta while waiting (this is poorly explained in the text). Once they use up all their quanta, they get removed from the "multiprogramming set" and go to "sleep" -- and an expensive OS dispatch is required to run them again.
The last issue is that
1. the DBMS uses a FCFS wait queue for the lock.
For a high-traffic DB lock, DB processes will request it on average every T timesteps. If the OS preempts a DB process holding that high-traffic DB lock, the queue behind the lock grows to include almost all DB processes. Moreover, the queue is too long to be drained in T timesteps, so it's "stable" -- every DB process queues back up before the queue drains, and they burn up their quanta pointlessly waiting in line, after which they are sent to sleep. Hence each DB process is awake for only one grant of the lock and the subsequent T timesteps of useful work, after which they queue for the lock again, waste their quanta in the queue, and are put back to sleep. The result is that the useful work per OS waking period is about T timesteps, which is shorter than the overhead of scheduling -- hence the system is thrashing.
Note that the solution attacks the only issue in the previous comment that can be handled without interating with the OS: (c) the FCFS DB lock queue. The explanation here is confusing, I think. The point is to always allow any one of the DB processes currently in the "multiprogramming set" to immediately get the lock without burning a quantum waiting on the lock -- hence no quanta are wasted on waiting, so each process spends almost all of its alloted quanta on "real work". Note that the proposed policy achieves this without needing to know which processes are in the OS' multiprogramming set.

System R and INGRES are the prototypes that all current systems are based on. Basic architecture is the same, and many of the ideas remain in today's systems:

optimizer remains, largely unchanged
RSS/RDS divide remains in many systems
SQL, cursors, duplicates, NULLs, etc.
- the pros and cons of duplicates. Alternatives?
- pros and cons of NULLs. Alternatives?
- grouping and aggregation
updatable single-table views
begin/end xact at user level
savepoints and restore
catalogs as relations
flexible security (GRANT/REVOKE)
integrity constraints
triggers (!!)
clustering
compiled queries
B-trees
Nest-loop & sort-merge join, all joins 2-way
dual logs to support log failure

Stuff they got wrong:

shadow paging
predicate locking
SQL language
- duplicate semantics
- subqueries vs. joins
- outer join
rejected hashing

OS and DBMS: Philosophical Similarities & Differences

UNIX paper: "The most important job of UNIX is to provide a file system".
- UNIX and System R are both "information management" systems!
- both also provide programming APIs for code
Difference in focus: Bottom-Up (elegance of system) vs. Top-Down (elegance of semantics)
- main goal of UNIX was to provide a small elegant set of mechanisms, and have programmers (i.e. C programmers) build on top of it. As an example, they are proud that "No large 'access method' routines are required to insulate the programmer from system calls". After all, OS viewed its role as presenting hardware to computer programmers.
- main goal of System R and Ingres was to provide a complete system that insulated programmers (i.e. SQL + scripting) from the system, while guaranteeing clearly defined semantics of data and queries. After all, DBMS views its role as managing data for application programmers.
- Affects where the complexity goes!
Achilles' heel of RDBMSs: a closed box
- Cannot leverage technology without going through the full SQL stack
- One solution: make the system extensible, convince the world to download code into the DBMS
- Another solution: componentize the system (hard? RSS is hard to bust up, due to transaction semantics)
Achilles' heel of OSes: hard to decide on the "right" level of abstraction
- As we'll read, many UNIX abstractions (e.g. virtual memory) hide *too* much detail, messing up semantics. On the other hand, too low a level can cause too much programmer burden, and messes up the elegance of the system
- One solution: make the system extensible, convince the fancy apps to download code into the OS
- Another solution: componentize the system (hard, due to protection issues)
Traditionally separate communities, despite subsequently clear need to integrate
- UNIX paper: "We take the view that locks are neither necessary nor sufficient, in our environment, to prevent interference between users of the same file. They are unnecessary because we are not faced with large, single-file data bases maintained by independent processes."
- System R: "has illustrated the feasibility of compiling a very high-level data sublanguage, SQL, into machine-level code".

So, a main goal of this class is to work from both of these directions, cull the lessons from each, and ask how to use these lessons today both within and OUTSIDE the context of these historically separate systems.