Viterbi Implementation Study for a TI TMS320VC5402 DSP
Homework #3, EE 290A
March 11, 1999
Paul Husted

Approach

We will investigate the costs and benefits of implementing a Viterbi Decoder using a TI TMS320C54xx DSP chip.  In particular we plan to estimate the impact that a variety of design parameters have on the following objective functions (as stated in class):

C54x Description

The TI TMS320C54x DSP is a new, low power line of DSPs unveiled by TI in June 1998.  These chips use 16 bit fixed point words, and can run at as low as .45 mW, or at 120 mW at 200 MIPS.  The latest in their series is the VC5420, which boasts two parallel 100 MIPS DSP cores, capable of operating at 1.8V.

For our implementation, we choose to use the VC5402, which has the following features:

Hardware capabilities of the entire C54x series include:

Viterbi Implementation

Texas instruments provides a reference document describing how to implement a Viterbi Decoder on a C54 DSP.  This document can be downloaded at:

 http://www-s.ti.com/sc/psheets/spra071/spra071.pdf

The C54x is highly optimized to perform the Viterbi Decomposition due to the following features of its ALU:

Specifications for our design include:

Analysis

From the TI Appplication Report, we have the following benchmarks for required operations/frame:
(FS = frame size, FR = frame rate)

Metric update: Cycles/frame = (#States/2 butterflies × butterfly calculation + TRN store + local dist calculation.) × # bits
    = (2 K-2 × 5 + 2 K-5 + 1 + n × 2 n-1 ) × FS
Traceback: Cycles/frame = (loop overhead and data storage + loop × 16) × # bits/16
    = (9 + 12 × 16) × FS/16
    = 201 × FS/16
Data reversal: Cycles/frame = 43 × FS/16
Total MIPS = Frame rate × (metric update + traceback + data reversal) cycles/frame
    = FR × [(2 K-2 × 5 + 2 K-5 + 1 + n × 2 n-1 ) × FS + (201/16) × FS + (43/16) × FS]
    = FR × FS × (2 K-2 × 5 + 2 K-5 + 1 + n × 2 n-1 + (201 + 43)/16)
    = FR × FS × (2 K-2 × 5 + 2 K-5 + n × 2 n-1 + 16.25)

So, for a frame size of 100K bits and frame rate of 1 Hz, we require 18.425 MIPS, well below the 100 MIPS provided by our processor.  This allows the processor to perform other operations than just decoding.  Alternatively, the processor can handle up to 582 Kbps performing only decoding, provided that the memory it must access can be loaded quickly enough.

Power:
 
Voltage
Current
Power
Core CPU
1.8 V
33 mA
59.4 mW
External Pins
3.3 V
30 mA
99 mW
Total Power
158.4 mW
IDLE2 
(shut down CPU and peripherals)
2.5 V
2 mA
5 mW
IDLE3 
(shut down processor entirely)
2.5 V
.005 mA
0.0125 mA
Note: Power data for the external RAM is not yet available, since TI does not yet list any approved vendors of external DRAM for the C54x series.

Cost:
 
Cost of Chip (50K quantities)
$5 - $75 ea.
Emulator Development Kit
$2,995
Code Composer Development Environment
$3156.45
Simulator
$1578.22
C Compiler/Assembler/Linker
$2367.33
Debugger
$3156.45
Development Tools:

TI boasts a fairly sophisticated, integrated set of development tools.  The Code Composer Development Environment claims to seamlessly integrate the compiler/assembler/linker, debugger, simulator and emulator.  It includes a signal probe, profiler, multiprocessor debugging, data visualization, interactive compiling, and a development environment similar to Microsoft Visual C++.  Run-time libraries are available to speed code development and assure code optimality, and TI claims efficiency of compiled code to be close to hand-assembled code.

Development Time:

This project could be completed in approximately 1 engineer week, from start to finish, using the latest tools.  This is based upon the somewhat sophisticated (comparitively) development tools available for the TI line of DSPs, the availability of run-time libraries, the availability of the Viterbi algorithm application report (with included code snippets), and the availability of configurable emulator boards with JTAG pins.

Code Size:
The code size should be less than 500 lines of assembly, which will easily fit into a ROM with 4K words.

Physical Chip Size:
The 144 pin ball grid array (BGA) measures 12 mm per side (144 mm^2).

Conclusion

This chip is an excellent choice for a low data rate implementation of the Viterbi Decoder.  The chip is fast, low power, and has specialized instructions that greatly accelerate the speed of decoding.  An application report from TI will speed development, and the with less than 20% of the chip operation time being utilized for a 100 Kbit/sec decoder, there is plenty of processing power remaining to implement other blocks in the receiver chain.

References