Advanced Topics in Computer Systems |
Lecture 2
|
Edited by Eric Brewer based on notes by Joe Hellerstein
|
|
System R & DBMS Overview
DBMS History
- late 60's: network (CODASYL) & hierarchical (IMS) DBMS. Charles
Bachman: father of CODASYL predecessor IDS (at GE in early 1960's).
Turing award #8 (1973, between Dijkstra and Knuth.)
- IMS Example: Suppliers record
type and Parts record type.
One is parent, one is child. Problems include redundancy and requirement of
having a parent (deletion anomalies.)
- Low-level ``record-at-a-time'' data
manipulation language (DML), i.e. physical data structures
reflected in DML (no data independence).
- 1970: Codd's paper. The most influential paper in DB research.
Set-at-a-time DML with the key idea of "data independence". Allows for
schema and physical storage structures to change under the covers.
Papadimitriou: "as clear a paradigm shift as we can hope to find in computer
science"). Edgar F. Codd: Turing award #18 (1981, between Hoare and
Cook).
- Data Independence, both logical and physical.
- What physical tricks could you play under the covers? Think about modern
HW!
- "Hellerstein's Inequality":
- Need data independence when dapp/dt
<< denvironment/dt
- Other scenarios where this holds?
- This is an early, powerful instance of two themes: levels of
indirection and adaptivity
- mid 70's: wholesale adoption of Codd's vision in 2 full-function (sort of)
prototypes. Ancestors of essentially all today's commercial systems
- Ingres : UCB 1974-77
- a ``pickup team'', including Stonebraker & Wong. early and
pioneering. Begat Ingres Corp (CA), CA-Universe, Britton-Lee, Sybase, MS
SQL Server, Wang's PACE, Tandem Non-Stop SQL.
- System R : IBM San Jose
(now Almaden)
- 15 PhDs. Begat IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem
Non-Stop SQL. System R arguably got more stuff ``right'', though there was
lots of information passing between both groups
- Jim Gray: Turing Award #22 (1998, between Englebart and Brooks)
- Lots of Berkeley folks on the System R team, including Gray (1st CS
PhD @ Berkeley), Bruce Lindsay, Irv Traiger, Paul McJones, Mike Blasgen,
Mario Schkolnick, Bob Selinger , Bob Yost. See http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html#Index71.
- Both were viable starting points, proved practicality of relational
approach. Direct example of theory -> practice!
- ACM Software Systems award #6 shared by both
- Stated goal of both systems was to take Codd's theory and turn it into
a workable system as fast as CODASYL but much easier to use and
maintain
- Interestingly, Stonebraker received ACM SIGMOD Innovations Award #1
(1991), Gray #2 (1992), whereas Gray got the Turing first.
- early 80's: commercialization of relational systems
- Ellison's Oracle beats IBM to market by reading white papers.
- IBM releases multiple RDBMSs, settles down to DB2. Gray (System
R), Jerry Held (Ingres) and others join Tandem (Non-Stop SQL), Kapali
Eswaran starts EsVal, which begets HP Allbase and Cullinet
- Relational Technology Inc (Ingres Corp), Britton-Lee/Sybase, Wang PACE
grow out of Ingres group
- CA releases CA-Universe, a commercialization of Ingres
- Informix started by Cal alum Roger Sippl (no pedigree to research).
- Teradata started by some Cal Tech alums, based on proprietary networking
technology (no pedigree to software research, though see parallel DBMS
discussion later in semester!)
- mid 80's: SQL becomes "intergalactic standard''.
- DB2 becomes IBM's flagship product.
- IMS "sunseted''
- today: network & hierarchical are legacy systems (though commonly in
use!)
- IMS still widely used in banking, airline reservations, etc. A cash cow
for IBM
- Relational commoditized -- Microsoft, Oracle and IBM fighting over bulk
of market. NCR Teradata, Sybase, HP Nonstop and a few others vying to
survive on the fringes. OpenSource coming of age, including MySQL,
PostgreSQL, Ingres (reborn). BerkeleyDB is an embedded transactional store
that is widely used as well, but now owned by Oracle.
- XML and object-oriented features have pervaded the relational products
as both interfaces and data types, further complicating the "purity" of
Codd's vision.
Database View of Applications
Big, complex record-keeping applications
like SAP and PeopleSoft, which run over a DBMS. "Enterprise
applications" to keep businesses humming. A smattering:
- ERP: Enterprise Resource Planning (SAP, Baan, PeopleSoft, Oracle, IBM,
etc.)
- CRM: Customer Relationship Management (E.phiphany, Siebel, Oracle, IBM,
etc.)
- SCM: Supply Chain Management (Trilogy, i2, Oracle, IBM, etc.)
- Human Resources, Direct Marketing, Call Center, Sales Force Automation,
Help Desk, Catalog Management, etc.
Typically client-server (a Sybase
"invention") with a form-based API. Focus on resource management secondary to
focus on data management.
Traditionally, a main job of a DBMS is to make
these kinds of apps easy to write
Relational System Architecture
See the article in
Foundations and Trends in Databases for in-depth disussion of many of the
issues below. Databases are BIG pieces of software. Typically somewhat hard to
modularize. Lots of system design decisions at the macro and micro scale. We
will focus mostly on micro decisions -- and hence ideas reusable outside DBMSs
-- in subsequent lectures. Here we focus on macro design.
Disk management choices:
- file per relation
- big file in file system
- raw device
Process Model:
- process per user
- server
- multi-server
Hardware Model:
- shared nothing
- shared memory
- shared disk
Basic modules:
- parser
- query rewrite
- optimizer
- query executor
- access methods
- buffer manager
- lock manager
- log/recovery manager
Notes on System R
See the System R reunion notes
for fun background and gossip.
Some "systems chestnuts" seen in this
paper:
- Expect to throw out the 1st version of the system
- Expose internals via standard external interfaces whenever possible
(e.g. catalogs as tables, the /proc filesystem, etc.)
- Optimize the fast path
- Interpretation vs. compilation vs. intermediate "opcode" representations
- Component failure as a common case to consider
- Problems arising from interactions between replicated functionality (in
this case, scheduling)
Some important points of discussion
- Flexibility of storage mechanisms: domains/inversions vs.
heap-files/indexes. Use of TID-lists common in modern DBMS. Why be
doctrinaire? What about Data Independence? One answer: you have to
get transactions right for each "access method".
- System R was often CPU bound (though that's a coarse-grained assertion --
really means NOT disk-bound). This is common today in well-provisioned
DBMSs as well. Why?
- DBMSs are not monolithic designs, really. The RSS stuff does
intertwine locking and logging into disk access, indexing and buffer
management. But RDS/RSS boundary is clean, and RDS is decomposable.
- Access control via views: a deep application of data independence?!
- Transactional contribution of System R (both conceptual and
implementation) as important as relational model, and in fact should be
decoupled from relational model.
The "Convoy Problem":
- A classic cross-level scheduling interaction. We will see this
again!
- Poorly explained in the paper.
I have always found this
presentation confusing. A number of issues are going on. The first two have
to do with interactions between OS and DB scheduling:
- the OS can preempt a database "process" even when that process is
holding a high-traffic DB lock
- DB processes sitting in DB lock queues use up their OS scheduling
quanta while waiting (this is poorly explained in the text). Once they use
up all their quanta, they get removed from the "multiprogramming set" and
go to "sleep" -- and an expensive OS dispatch is required to run them
again.
The last issue is that
- the DBMS uses a FCFS wait queue for the lock.
For a
high-traffic DB lock, DB processes will request it on average every T
timesteps. If the OS preempts a DB process holding that high-traffic DB
lock, the queue behind the lock grows to include almost all DB processes.
Moreover, the queue is too long to be drained in T timesteps, so it's
"stable" -- every DB process queues back up before the queue drains, and
they burn up their quanta pointlessly waiting in line, after which they are
sent to sleep. Hence each DB process is awake for only one grant of the lock
and the subsequent T timesteps of useful work, after which they queue for
the lock again, waste their quanta in the queue, and are put back to sleep.
The result is that the useful work per OS waking period is about T
timesteps, which is shorter than the overhead of scheduling -- hence the
system is thrashing.
- Note that the solution attacks the only issue in the previous comment
that can be handled without interating with the OS: (c) the FCFS DB lock
queue. The explanation here is confusing, I think. The point is to always
allow any one of the DB processes currently in the "multiprogramming set" to
immediately get the lock without burning a quantum waiting on the lock --
hence no quanta are wasted on waiting, so each process spends almost all of
its alloted quanta on "real work". Note that the proposed policy achieves
this without needing to know which processes are in the OS' multiprogramming
set.
System R and INGRES are the prototypes that all current systems are based
on. Basic architecture is the same, and many of the ideas remain in
today's systems:
- optimizer remains, largely unchanged
- RSS/RDS divide remains in many systems
- SQL, cursors, duplicates, NULLs, etc.
- the pros and cons of duplicates. Alternatives?
- pros and cons of NULLs. Alternatives?
- grouping and aggregation
- updatable single-table views
- begin/end xact at user level
- savepoints and restore
- catalogs as relations
- flexible security (GRANT/REVOKE)
- integrity constraints
- triggers (!!)
- clustering
- compiled queries
- B-trees
- Nest-loop & sort-merge join, all joins 2-way
- dual logs to support log failure
Stuff they got wrong:
- shadow paging
- predicate locking
- SQL language
- duplicate semantics
- subqueries vs. joins
- outer join
- rejected hashing
OS and DBMS: Philosophical Similarities & Differences
- UNIX paper: "The most important job of UNIX is to provide a file system".
- UNIX and System R are both "information management" systems!
- both also provide programming APIs for code
- Difference in focus: Bottom-Up (elegance of system) vs. Top-Down (elegance
of semantics)
- main goal of UNIX was to provide a small elegant set of
mechanisms, and have programmers (i.e. C programmers) build on top of
it. As an example, they are proud that "No large 'access method'
routines are required to insulate the programmer from system calls".
After all, OS viewed its role as presenting hardware to computer
programmers.
- main goal of System R and Ingres was to provide a complete system that
insulated programmers (i.e. SQL + scripting) from the system, while
guaranteeing clearly defined semantics of data and queries.
After all, DBMS views its role as managing data for application
programmers.
- Affects where the complexity goes!
- to the system, or the end-programmer?
- question: which is better? in what environments?
- follow-on question: are internet systems more like enterprise apps
(traditionally built on DBMSs) or scientific/end-user apps (traditionally
built over OSes and files)? Why?
- Achilles' heel of RDBMSs: a closed box
- Cannot leverage technology without going through the full SQL stack
- One solution: make the system extensible, convince the world to download
code into the DBMS
- Another solution: componentize the system (hard? RSS is hard to bust up,
due to transaction semantics)
- Achilles' heel of OSes: hard to decide on the "right" level of abstraction
- As we'll read, many UNIX abstractions (e.g. virtual memory) hide *too*
much detail, messing up semantics. On the other hand, too low a level
can cause too much programmer burden, and messes up the elegance of the
system
- One solution: make the system extensible, convince the fancy apps to
download code into the OS
- Another solution: componentize the system (hard, due to protection
issues)
- Traditionally separate communities, despite subsequently clear need to
integrate
- UNIX paper: "We take the view that locks are neither necessary nor
sufficient, in our environment, to prevent interference between users of the
same file. They are unnecessary because we are not faced with large,
single-file data bases maintained by independent processes."
- System R: "has illustrated the feasibility of compiling a very
high-level data sublanguage, SQL, into machine-level code".
So, a main goal of this class is to work from both of these
directions, cull the lessons from each, and ask how to use these lessons today
both within and OUTSIDE the context of these historically separate
systems.