The following comments, regarding
Lecture 4 and
Lecture 5, are from Olivet Chou, of
Mitsubishi Electronics America, Inc.
- Conceptually, the description for Figure 1 is correct, but all other
3D rendering controller designers will tell you that the data transfers
across the "bus" is always bursty to achieve bandwidth efficiency.
The burst transfer itself is a tradeoff between too much data (wasted
burst cycles) and not enough data (needing another burst transfer).
- The two level of caches in 3D-RAM not only solves the speed mismatch
and balances the bandwidth, but also enables the on-chip ALU to
realize its benefit. This is so because the only way to get the ALU
working as desired is to have a dual-port memory. It is VERY expensive
to build a dual-port DRAM at 10Mbits density. On the other hand, a
2Kbits triple ported SRAM is an economical solution. Mitsubishi is
leading the way in implementing SRAM in DRAM process technology in
production, in addition to the much hyped DRAM + logic on the DRAM
- As mentioned in one of the lectures in the course, "a new technology
needs a roadmap." 3D-RAM is a family of products. Mitsubishi currently
have 3 generations of 3D-RAM products.
3D-RAM(1) P/N M5M410092 0.50um proc, 2metal, 4poly, in production
3D-RAM(2) P/N M5M410092A 0.45um proc, 2metal, 4poly, in production
3D-RAM(3) P/N M5M410092B 0.40um proc, 2metal, 4poly, production soon
The lecture notes incorrectly states the process to be 0.55 um, 1 level
metal. The point is Mitsubishi is very much in sync with the state-of-
the-art CMOS process technology.
- The performance cited in the lecture notes corresponds to Figure 4.
This means that the rendering controller has (32 + 32) x 4 = 256 bits
data interface with the 3D-RAM based frame buffer. This is workstation
class controller delivering super-workstation class 3D graphics.
- Your notes from discussion on Jan 26, 1996 state that "a lesson from
the FBRAM chip seems that we can use the sense amps as a second level
cache ... this cache wouldn't be very big, namely 128KB for a 1GB DRAM".
Today, Synchronous DRAM (SDRAM) and Synchronous Graphics RAM (SGRAM)
have 8192 bits for a 4 Mbits DRAM bank at density of 16 Mbits or 8 Mbits
per DRAM chip. Even at 64 Mbits per chip density, it is conceivable to
have 8192 bits for sense amps for an 8 Mbits DRAM bank. In this latter
system, you would get 1 MBytes of sense amps in a 1 Gbytes DRAM, or
128KBytes of sense amps in 1 Gbits DRAM, depending on how you look
But, the point is totally missed. The sense amps are not suitable to
function as the first line cache between the processor (or the on-
processor cache) and the DRAM cells because of the speed mismatch
between the two is still too significant. Mitsubishi believes the
approach of integrated SRAM on a DRAM chip is the best approach to
solve this speed mismatch problem, even if there is NO on-chip ALU.
Mitsubishi pioneered Cache DRAM (CDRAM) and introduced a Multi-port
Cache DRAM (MP-RAM) at the quarterly semiconductor memory supplier
meet (JEDEC meeting) at Portland, OR in June 1996. A on-chip SRAM
100% matches up with the controller's appetite for data, while the
internal global bus between the DRAM and the on-chip SRAM compensates
for the slower speed of the sense amps with an 8x width. Note again
that this principle holds even if there is NO on-chip ALU. The first
step to a truly intelligent RAM to get this IRAM to provide the RIGHT
data to the controller cycle by cycle without missing a beat!
For the latest analysis on CDRAM (or DRAM + SRAM) advantage, please
see the July 22, 1996 issue of the EE Times newspaper, p.57.
- Not to belabor the point too much, but it is not really the absolute
size of the on-chip SRAM or the number of sense amps per DRAM bank
that holds up the performance bottleneck, but rather it is the
MATCHING of the bandwidth and speed at every stage. It is OK to have
a cache miss, as long as the IRAM may be informed several cycles (e.g.
4) in advance (which is entirely possible from the controller's design
perspective). The SRAM cache can then perform a line flush and line
update, and still present the requested data in time.
Return to course