Serially Multiported DRAM (SMDRAM) based computer architectures D. Litaize, A. Mzoughi, P. Sainrat, C. Rochange IRIT, University, Paul Sabatier, Toulouse France email : {litaize, mzoughi, sainrat, rochange }@irit.fr Problem Description The discrepancy between processor speed and DRAM access time is increasing. All DRAM enhancements (EDO, BEDO, SDRAM, EDRAM, CDRAM, MDRAM, Rambus, SDRAM-DDR, Direct Rambus, SLDRAM) are masked by the memory controller and leaves the memory interface nearly unchanged. DRAM chips will soon reach the gigabit, and most of memory systems will be composed of a few chips. DRAM chips are internally multibanked, leading to a huge unexploited internal bandwidth (more than half a Tbit/s per chip considering 32 banks each being able to read 1Kbit every 50ns). SMDRAM is a new architecture which could help to avoid reaching the "memory wall". The SMDRAM Solution Nearly all memory systems are used to feed, mainly on a data block basis, the last-level cache of a memory hierarchy and I/Os. In a SMDRAM chip, each internal bank has its own register able to record a DRAM page, typically 1Kbit, the size of which is a data block. These registers are built as shift registers, and data are shifted in and out at a high speed, at least 1 Gb/s, through multiple independent I/O pins. An internal crossbar can connect each shift register to any I/O port. Main Features of a SMDRAM-based Computer Architecture. Last-level caches, graphic processors, I/O bridges and basic interface chips (disks, network) are connected through multiple serial point-to-point busses directly to the memory SMDRAM system. This direct connection ensures the lowest latency time (no chip set to cross). Point to point connections achieve the highest possible data transfer rate, using perfect electrically-adapted lines (the multipoint direct Rambus data bus runs now at 800 MHz, a point-to-point one can achieve higher frequencies). A serial bus means here one or few bits wide bus: the main feature is to set a data path between shift registers and to transfer data continuously without any stalls and at the highest possible rate. Therefore receiving the requested word first in order to bypass it to the processor is done by recording it too in a separate word-wide shift register. Multiplexing techniques inside the SMDRAM chip leads to lower internal running frequencies, high speed being located around the I/O pins. So, the major feature is that multiple transfers can be performed in parallel on different serial busses. For a cache miss, the access time to the first word of a block is almost the same as for a wider multipoint data bus (data transfer time is a little part of the overall access time), but, as several data blocks can be transferred simultaneously, the access time of the first word of pending blocks issued from a non-blocking cache is shorter provided the cache has several serial busses. These serial busses can also be used for block address transfers by units that are not involved in data cache coherency, thus relaxing the load on the snoopy address bus. The overall design of the system is simplified, and the number of chip pins is reduced. Changes at the hardware level are located at the memory interface, and, as only data blocks are transferred on the serial busses, chips are controlled using a separate bus, IEEE P1795 for example, leading to some software changes in drivers. Explaining why using a SMDRAM memory system with its necessary set of narrow busses (at limit serial) leads to a higher performance than a classical DRAM memory system with one parallel bus is not trivial; the gain comes from a better use of the internal memory chip parallelism and from higher rates. Up to a few SMDRAM chips, scalability is easy to implement, and clocking problems can be solved using a simple phase-recovery logic. SMDRAM could therefore represent the next generation of DRAM chip architecture. More than a simple memory enhancement, the SMDRAM is a technological and architectural step that affords new possibilities.