First CS 294 Course Assignments

Programming Assignments

IRAM vs. conventional caches on database/OS trace. I have a CD containing Dick Sites trace of the Microsoft SQL server running on the Windows NT system for a DEC Alpha computer [Sit96.] The first step would be to recreate the results he claimed in his paper and his course at Berkeley. The second step would be to see how well IRAM would work for his workload. You would start from a straight-forward cache design, just with very wide blocks that are loaded quickly, e.g., form 1024 bits to 16384 bits in 50 ns, and very the number of Sense Amps/Buffers. The question is whether the benefits (if any) come from reuse, spatial locality, or simply the wide bandwidth. Its possible that there is no benefit, as each access might look like a random 32-bit load or store. If IRAM does have a performance advantage, estimate how much slower an IRAM processor could be and still be as fast as the Alpha with a conventional memory system.
(Windsor Hsu and Min Zhou)
(Also, Remzi Arpaci is doing a similar but independent assignment.)
Vector vs. Superscalar/conventional Cache on SPEC95. Its possible that the vector computing [Joh78] would be a very good match to IRAM. SPEC 95 includes a set of integer programs in C and floating point programs in Fortran, and you would expect that some of the floating point programs would do very well on a Cray. I expect to have the SPEC95 CD this week. This project would run programs with and without vectorization, and report the results. Included in the paper would be comparison with some superscalar RISC machines on each program (whose results can probably be found via the Computer Architecture Home Page), so that we can see where vector works well and where it works poorly. I can probably get you an account on a Cray if you don't already have access to one. Since there are many SPEC95 programs, it probably makes sense just to do a subset, so several groups could take a crack at this. If time permits, it would be interesting to see why the results are good for vector vs. superscalar/cache. See if you can characterize which SPEC programs are a good match (e.g., tomcatv) versus a poor match (e.g., gcc) for IRAMs. (To my best knowledge, I've never seen SPEC ratings on Cray Research computers, so this would be a first.)
(Cedric Krumbein and Richard P. Martin)
Instruction/Data Correlation Instead of moving data to the processor, move the processor to the data. For a given piece of data, what is the size of the code that accesses it, and what other pieces of data does that code access.
(Trevor Pering)

Literature Search Assignments

Using the following resources

Using WWW search engines (such as Inktomi or Alta Vista ).
The University of California on-line library data base Melvyl .
Good old chasing references in papers and going to the library.

find on-line and regular references to architecture studies that describe proposed or real computers where the processing is next to the memory. Your tasks is to summarize this work by including including the proper citation, a short summary of the claimed results along with the pros and cons, and a single on-line table that summarizes all of the projects. Items to include would be year, style of machine (uniprocessor, SIMD, MIMD, DSP, ...), application area, performance claims, status of hardware (if any), citation, and so on. Especially important will be finding on-going projects!

History and State-of-the-Art of Logic in Memory Chips. Similar to the assignment above, find examples of proposals or chips that are dominated by memory but include logic on the chip. In addition to the categories mentioned above, see if you can find estimates of logic size, power, and speed if it is a DRAM chip or memory size, power, and speed if it is a logic chip. Also list as many process parameters as available.
(Bruce McGaughy and Xinhui Niu)
(Also, Lloyd Y. Huang is doing a similar but independent assignment.)
History and State-of-the-Art of Compiler Controlled Memory Hierarchy. Similar to the assignments above, review the status of the effectiveness of compilers in improving performance of memory accesses by explicit optimizations of the memory transfers. Examples should include vector architectures, cache-based optimizations, and anything else along these lines. In addition to the categories mentioned above, summarize the types of programs for which the optimizations work well and those which work poorly.
(Joe Darcy and Manuel Fahndrich)
Program Size and Page Fault Optimization Survey. This is a survey report on optimizations for program space, as well as historical work on reducing page-faults in programs. Since IRAM will be limited to the memory of a single DRAM, code and data space are important considerations.
(Nick Weaver)
History and State-of-the-Art of Circuit and Architecture in DRAM Chip This survey is going to summarize various DRAM designs with perspectives on their circuit techniques and architecture, in order to reveal the potentials and limitations in the launch of IRAM chip. We will evaluate claimed results of performances, techniques in circuit design, and pros and cons in matching DRAMs with different types of system. Based on the survey we want to extract some plausible strategies, especially, the strategies for physical design of the memory part in IRAM, in terms of area, power, timing, noise, etc.
(Hui Zhang)
DRAM Architecture Tradeoffs. DRAM designs are typically optimized for operation with the traditional RAS/CAS memory interface. In an IRAM processor, the processor and memory reside together on a single die, so the DRAM does not need to deliver its data off-chip. Hence many design choices exist as to how to interface the DRAM to the processor(s). I will survey the impact of several factors such as block size, column decoding, and address decoding on the overall performance of the DRAM. An accurate characterization of the DRAM will enable sound architectural decisions to be made as to how best to interface the memory to the IRAM processor core.
(James Young)
History and State-of-the-Art of DRAM Testability Issues. A major portion of the cost of a DRAM is testing time. It may be possible to utilize the processing power present in an IRAM to reduce this cost. However, the additional complexities of testing the processor logic could also increase this cost. Before delving into the issues of how an IRAM might affect the DRAM test costs, it is useful to first understand the history and state of the art of DRAM testability issues, including both traditional testing and more novel techniques such as Built in Self Test (BiST). It would also be helpful to investigate how existing chips that merge logic within a DRAM process perform testing of the logic. This project is a literature search to explore these areas.
(Rich Fromm)
History and State-of-the-Art of Digital Signal Processors and Memory Bandwidth. DSP designers have also been pursuing the concept of DSP processors built on-board a DRAM chip. This literature survey will provide a brief history of conventional memory accesses in DSP applications, and will then focus on recent industry developments in overcoming the memory bandwidth issue.
(Heather Bowers)
History and State-of-the-Art of Code and Data Compression. Given the speed of off-chip accesses, it may be important to reduce size of instructions and data so as to fit more on chip. Look at the papers from the 1970s on instruction set encoding, as well as Huffman encoding. See if you can find any schemes that looked at using less space for data. Also, review standard compression technology to see if there were any schemes that might let you use standard instruction sets and data but decompress on-the-fly from memory and compress on-the-fly to memory. Also includes a survey of instruction set support for direct memory management in existing architectures.
(Craig Teuscher.)

Various thoughts and comments about IRAM and the lectures.
(Seth Copen Goldstein)