Programming Assignments
-
IRAM vs. conventional caches on database/OS trace.
I have a CD containing Dick Sites trace of the Microsoft SQL server running
on the Windows NT system for a DEC Alpha computer
[Sit96.]
The first step would be to recreate the results he claimed in his paper and
his course at Berkeley. The second step would be to see how well IRAM would
work for his workload. You would start from a straight-forward cache design,
just with very wide blocks that are loaded quickly, e.g., form 1024 bits to
16384 bits in 50 ns, and very the number of Sense Amps/Buffers. The question
is whether the benefits (if any) come from reuse, spatial locality, or simply
the wide bandwidth. Its possible that there is no benefit, as each access
might look like a random 32-bit load or store.
If IRAM does have a performance advantage, estimate how much slower an
IRAM processor could be and still be as fast as the Alpha with a
conventional memory system.
(Windsor Hsu and Min Zhou)
(Also, Remzi Arpaci is doing a similar but independent assignment.)
-
Vector vs. Superscalar/conventional Cache on SPEC95.
Its possible that the vector computing [Joh78] would be a very good match
to IRAM. SPEC 95 includes a set of integer programs in C and floating point
programs in Fortran, and you would expect that some of the floating point
programs would do very well on a Cray. I expect to have the SPEC95 CD this
week. This project would run programs with
and without vectorization, and report the results. Included in the paper
would be comparison with some superscalar RISC machines on each program
(whose results can probably be found via the Computer Architecture Home Page),
so that we can see where vector works well and where it works poorly.
I can probably get you an account on a Cray if you don't already have access
to one. Since there are many SPEC95 programs, it probably makes sense just to
do a subset, so several groups could take a crack at this. If time permits,
it would be interesting to see why the results are good for vector vs. superscalar/cache. See if you can characterize which SPEC programs are a good match
(e.g., tomcatv) versus a poor match (e.g., gcc) for IRAMs.
(To my best knowledge, I've never seen SPEC ratings on Cray Research
computers, so this would be a first.)
(Cedric Krumbein and Richard P. Martin)
-
Instruction/Data Correlation
Instead of moving data to the processor,
move the processor to the data. For a given piece of data, what is
the
size of the code that accesses it, and what other pieces of data does
that
code access.
(Trevor Pering)
Literature Search Assignments
Using the following resources
- Using WWW search engines (such as
Inktomi
or
Alta Vista ).
- The University of California on-line library data base
Melvyl .
- Good old chasing references in papers and going to the library.
find on-line and regular references to architecture studies that describe
proposed or real computers where the processing is next to the memory.
Your tasks is to summarize this work by including including the proper citation,
a short summary of the claimed results along with the pros and cons,
and a single on-line table that summarizes all of the projects. Items to
include would be year, style of machine (uniprocessor, SIMD, MIMD, DSP, ...),
application area, performance claims, status of hardware (if any), citation,
and so on. Especially important will be finding on-going projects!
-
History and State-of-the-Art of Logic in Memory Chips.
Similar to the assignment above, find examples of proposals or chips that
are dominated by memory but include logic on the chip. In addition to
the categories mentioned above,
see if you can find estimates of logic size, power, and
speed if it is a DRAM chip or memory size, power, and speed if it is a
logic chip. Also list as many process parameters as available.
(Bruce McGaughy and Xinhui Niu)
(Also, Lloyd Y. Huang is doing a similar but independent assignment.)
-
History and State-of-the-Art of Compiler Controlled Memory Hierarchy.
Similar to the assignments above, review the status of the effectiveness of
compilers in improving performance of memory accesses by explicit optimizations of the memory transfers. Examples should include vector architectures,
cache-based optimizations, and anything else along these lines.
In addition to the categories mentioned above,
summarize the types of programs for which the optimizations work well and
those which work poorly.
(Joe Darcy and Manuel Fahndrich)
- Program
Size and Page Fault Optimization Survey.
This is a survey report on optimizations for program space, as well
as historical work on reducing page-faults in programs. Since IRAM
will be limited to the memory of a single DRAM, code and data space
are important considerations.
(Nick Weaver)
-
History and State-of-the-Art of Circuit and Architecture in DRAM
Chip
This survey is going to summarize various DRAM designs with
perspectives on
their circuit techniques and architecture, in order to reveal the
potentials
and limitations in the launch of IRAM chip. We will evaluate claimed
results
of performances, techniques in circuit design, and pros and cons in
matching
DRAMs with different types of system. Based on the survey we want to
extract
some plausible strategies, especially, the strategies for physical
design of
the memory part in IRAM, in terms of area, power, timing, noise, etc.
(Hui Zhang)
-
DRAM Architecture Tradeoffs.
DRAM designs are typically optimized for operation with the
traditional
RAS/CAS memory interface. In an IRAM processor, the processor
and memory reside together on a single die, so the DRAM does not need
to
deliver its data off-chip. Hence many design choices exist as to
how to interface the DRAM to the processor(s). I will survey the
impact
of several factors such as block size, column decoding, and address
decoding on the overall performance of the DRAM.
An accurate characterization of the DRAM will enable sound
architectural
decisions to be made as to how best to interface the memory to the
IRAM processor core.
(James Young)
-
History and State-of-the-Art of DRAM Testability Issues.
A major portion of the cost of a DRAM is testing time. It may be
possible to utilize the processing
power present in an IRAM to reduce this cost. However, the additional
complexities of testing the
processor logic could also increase this cost. Before delving into the
issues of how an IRAM might
affect the DRAM test costs, it is useful to first understand the
history and state of the art of DRAM
testability issues, including both traditional testing and more novel
techniques such as Built in Self Test
(BiST). It would also be helpful to investigate how existing chips
that merge logic within a DRAM
process perform testing of the logic. This project is a literature
search to explore these areas.
(Rich Fromm)
-
History and State-of-the-Art of Digital Signal Processors and
Memory Bandwidth.
DSP designers have also been pursuing the concept of
DSP processors built on-board a DRAM chip. This literature
survey will provide a brief history of conventional memory
accesses in DSP applications, and will then focus on recent
industry developments in overcoming the memory bandwidth
issue.
(Heather Bowers)
-
History and State-of-the-Art of Code and Data Compression.
Given the speed of off-chip accesses, it may be important to reduce
size of instructions and data so as to fit more on chip.
Look at the papers from the 1970s on instruction set encoding, as well
as Huffman encoding. See if you can find any schemes that looked at
using less space for data. Also, review standard compression
technology to see if there were any schemes that might let you use
standard instruction sets and data but decompress on-the-fly from
memory and compress on-the-fly to memory. Also includes a survey of
instruction set support for direct memory management in existing
architectures.
(Craig Teuscher.)
Various thoughts and comments about IRAM and the lectures.
(Seth Copen Goldstein)