Serially Multiported DRAM (SMDRAM) based computer architectures 
D. Litaize, A. Mzoughi, P. Sainrat, C. Rochange 
IRIT, University, Paul Sabatier, Toulouse France 
email : {litaize, mzoughi, sainrat, rochange }@irit.fr 


Problem Description

The discrepancy between processor speed and DRAM access time is
increasing. All DRAM enhancements (EDO, BEDO, SDRAM, EDRAM, CDRAM,
MDRAM, Rambus, SDRAM-DDR, Direct Rambus, SLDRAM) are masked by the
memory controller and leaves the memory interface nearly
unchanged. DRAM chips will soon reach the gigabit, and most of memory
systems will be composed of a few chips. DRAM chips are internally
multibanked, leading to a huge unexploited internal bandwidth (more
than half a Tbit/s per chip considering 32 banks each being able to
read 1Kbit every 50ns). SMDRAM is a new architecture which could help
to avoid reaching the "memory wall".

The SMDRAM Solution 

Nearly all memory systems are used to feed, mainly on a data block
basis, the last-level cache of a memory hierarchy and I/Os. In a
SMDRAM chip, each internal bank has its own register able to record a
DRAM page, typically 1Kbit, the size of which is a data block. These
registers are built as shift registers, and data are shifted in and
out at a high speed, at least 1 Gb/s, through multiple independent I/O
pins. An internal crossbar can connect each shift register to any I/O
port.

Main Features of a SMDRAM-based Computer Architecture.  

Last-level caches, graphic processors, I/O bridges and basic interface
chips (disks, network) are connected through multiple serial
point-to-point busses directly to the memory SMDRAM system. This
direct connection ensures the lowest latency time (no chip set to
cross). Point to point connections achieve the highest possible data
transfer rate, using perfect electrically-adapted lines (the
multipoint direct Rambus data bus runs now at 800 MHz, a
point-to-point one can achieve higher frequencies). A serial bus means
here one or few bits wide bus: the main feature is to set a data path
between shift registers and to transfer data continuously without any
stalls and at the highest possible rate. Therefore receiving the
requested word first in order to bypass it to the processor is done by
recording it too in a separate word-wide shift register.  Multiplexing
techniques inside the SMDRAM chip leads to lower internal running
frequencies, high speed being located around the I/O pins. So, the
major feature is that multiple transfers can be performed in parallel
on different serial busses. For a cache miss, the access time to the
first word of a block is almost the same as for a wider multipoint
data bus (data transfer time is a little part of the overall access
time), but, as several data blocks can be transferred simultaneously,
the access time of the first word of pending blocks issued from a
non-blocking cache is shorter provided the cache has several serial
busses. These serial busses can also be used for block address
transfers by units that are not involved in data cache coherency, thus
relaxing the load on the snoopy address bus. The overall design of the
system is simplified, and the number of chip pins is reduced. Changes
at the hardware level are located at the memory interface, and, as
only data blocks are transferred on the serial busses, chips are
controlled using a separate bus, IEEE P1795 for example, leading to
some software changes in drivers.  Explaining why using a SMDRAM
memory system with its necessary set of narrow busses (at limit
serial) leads to a higher performance than a classical DRAM memory
system with one parallel bus is not trivial; the gain comes from a
better use of the internal memory chip parallelism and from higher
rates. Up to a few SMDRAM chips, scalability is easy to implement, and
clocking problems can be solved using a simple phase-recovery
logic. SMDRAM could therefore represent the next generation of DRAM
chip architecture.  More than a simple memory enhancement, the SMDRAM
is a technological and architectural step that affords new
possibilities.