Vector IRAM
Dave Patterson
March 2, 1996
Problems in IRAM design:
Logic is slower in DRAM process
Adding a processor means an instruction set which cutomizes part ,
limits software
How can you really use the phenomenal bandwidth?
Observations
The vector processing units on different brands of vector computer are
largely the same, with the same operations and registers. There is basically
widespread agreement on the design of vecot units.
Vector units can trade off clock rate and amount of hardware: you can
build a vector processor with the same peak bandwidth in a slower technology
simply by replicating the function units so that they do, say, 4 elements per
clock cycle
The large costs of vector systems is the network that connects the
memory banks to the vector units. It essentially is a cross bar or fat tree.
There is a very clear dividing line between a vector processor and a
scalar processor: vector isntructions, operations to load vector length and
vector mask regiters, are the primary items that cross the line.
Proposal:
Instead of putting a full processor in a DRAM, put a vector unit in a DRAM
and provide a port between a traditional processor and the vector IRAM.
Across this port goes vector instructions and pssobily scalar values, which
can specify a lot of work in a few bits. Thus a conventional processor-cache
complex might operate well things that work well on caches, and anything
that needs lots of bandwidth done inside the memory, using the standard load-
store interface to communicate between the two worlds.
Details:
The width of this port depends on how radical you want the IRAM.
A conservative model would use, say, Synchronous DRAM to send
instructions in peices into an instruction queue at whatever rate the processor
can generate information. By reserving a portion of the space for commands,
you can get information from the address lines as well as the data lines.
A more radical model might use the Rambus interface to ship
instructions in 8b chunks. These is no need to send a single instruction at a
time.
Of course, you can make the portal as wide as you want
You can have multiple vector IRAMs if you need more memory or more
processing; communication between IRAMs could be done by
chip to chip transfers over the memory bus, assuming an appropriate
controller
through the processor via a block move instruction using the normal
memory interface
via a network connection between IRAMs
By adding some instructions to manipulate the vector control registers
(moves, possibly simple arithmetic) you may be able to reduce the number of
scalar moves
Comments:
* The single, central set of vector registers mean that memory on chip is
uniformly acessible, unlike some of the SIMD approaches to IRAM.
* The speed of the logic on a DRAM process simply determines the width of
the vector units. If logic is 1/2 speed, we can get the same peak performance
by doubling the number of vecotr elements processed per clock (costing twice
as much hardware). The cost is larger vector startup time, making a large
N1/2 value.
* Key to the design is the interconnect: how much area does it take to provide
a potent interconnect?
Questions:
How much data and instructions really cross the vector-scalar interface?
How expensive is the hardware to fully connect to the memory
modules?
How much area would it take for, say, 16 or 32 vector registers, each
with 64 or 128 vector elements? what about the functional units?
Can the scalar register remain in the vector unit, or must they be
transmitted as well across that interface? (Since its only for reads, may not be
too back)
How good a match is vectors to visualization instructions?
Can the idea become popular as graphics accelerators?
Is the overhead of communication so high that its better to perform
length 1 vector operations than to perform operations in the scalar unit?
Can software handle all the syncronization between scalar and vector
accesses? (e.g., read/write conflicts to same work)
Do any such machines exist so that we could look at the code?
How many applicatios will run well with vector assist?