EE290A Homework #3, part 1
March 11th, 1999
David Chinnery, Rhett Davis, Chris Taylor, Ning Zhang
Chris Taylor is in the process of obtaining a Viterbi core that was
originally designed by Teresa Meng and has since been scaled.
Description:
The basic hardware layout for the Viterbi decoder blocks are shown below.
The ACS cells and SRAM will be custom designed, and standard cell will be used
for the other blocks. The estimates are for 1.0 V supply voltage, 0.25 um
technology. Power estimates for a target speed of 100 Kbits/s.
Initial estimates for power, area and speed are based on the ACSU, and the
survivor memory unit (SMU) and SRAM.
SRAM and SMU:
The SRAM is generated by a script to automatically generate SRAM cells [Burstein]. A Viterbi traceback depth of 100 decisions assumed, this is rounded up to give 128 words of 64 bits - each bit encodes the decision at each of the 64 states as determined by the add compare select unit (ACSU). Three SRAM blocks of 128 words are used, as traceback, decoding and writing the next set of decisions are all performed simultaneously (the control, address and data lines are multiplexed between the blocks).
To route control, address and data lines from and to the appropriate SRAM block 384 4-input multiplexors are required. Each multiplexor takes up an area of 153um2 and consumes approximately 17nJ per cycle (based on CV2 estimates). It is assumed that the remaining SMU logic takes about four times the area and power of the multiplexors.
ACSU:
The recursive state metric update results in unbounded word growth due to the addition of branch metrics. In order to prevent arithmetic overflow situations and in order to keep power and delay for the add and compare operations in the ACS unit as small as possible, modulo arithmetic approach (refer to Meng's paper) is used. Modulo arithmetic exploits the fact that the Viterbi algorithm inherently bounds the maximum dynamic range of the state metrics. The required state metric precision is equal to twice the maximum dynamic range of the updated state metrics just before the compare stage and the number of bits required is chosen to be 12 bits to ensure correct operation under our specification. The simulated SNR degradation is about 0.05dB.
Assuming a fully parallel structure, for each of the 64 states there are two 12-bit adders to update the state metrics, one 12-bit adder as a comparator, one 12-bit 2:1 mux, and one 12-bit register to hold the state metric. The critical path delay are the two 12-bit ripple adders, which is 2*8.66=17.3ns.
The area of the ACSU is calculated from the 3*64=192 12-bit adders used in parallel (12*10.6um*22.7um each), 64 12-bit registers (12*10.6um*11.0um each) and 64 12-bit 2:1 muxes. The total area excluding routing is 0.64mm2.
Each 12-bit ripple adder consumes 0.215pJ per operation (based on Powermill simulations on random inputs). For 100kb/sec, there are 105*192 additions per second, giving 4.1uW.
ASIC Total Power, Area and Delay Estimates: An additional area of 0.5 mm2 is assumed for additional routing area required and the transition metric unit.
References:
Andy Burstein, "SRAM," from the Low Power Cell Library.
Peter J. Black and Teresa Meng, "A 140 Mb/s, 32 state, radix-4 Viterbi
decoder," Journal of Solid-State Circuits, 27-12, pp 1877-1885, 1992.
"Viterbi Decoders: High Performance Algorithms and Architectures,"
chapter 17 of Heinrich Meyr's book.
Marlene Wan, Energy Estimates for the Maia chip.