SPRA921
5
TMS320C6713 Digital Signal Processor Optimized for High Performance Multichannel Audio Systems
2
C67x CPU and Instruction Set
The TMS320C6713 floating-point digital signal processor uses the C67x VelociTI advanced
very-long instruction words (VLIW) CPU. The CPU fetches (256 bits wide) to supply up to eight
32-bit instructions to the eight functional units during every clock cycle. The VelociTI VLIW
architecture also features variable-length execute packets; these variable-length execute
packets are a key memory-saving feature, distinguishing the C67x CPU from other VLIW
architectures.
Operating at 225 MHz, the TMS320C6713 delivers up to 1350 million floating-point operations
per second (MFLOPS), 1800 million instructions per second (MIPS), and with dual
fixed-floating-point multipliers up to 450 million multiply-accumulate operations per second
(MMACS).
2.1
Functional Units
The CPU features eight of functional units supported by 32 32-bit general purpose registers.
This data path is divided into two symmetric sides consisting of 16 registers and 4 functional
units each. Additionally, each side features a data bus connected to all the registers on the other
side, by which the two sets of functional units can access data from the register files on the
opposite side.
2.2
Fixed and Floating Point Instruction Set
The C67x CPU executes the C62x integer instruction set. In addition, the C67x CPU natively
supports IEEE 32-bit single precision and 64-bit double precision floating point. In addition to
C62x fixed-point instructions, six out of the eight functional units also execute floating-point
instructions: two multipliers, two ALUs, and two auxiliary floating point units. The remaining two
functional units support floating point by providing address generation for the 64-bit loads the
C67x CPU adds to the C62x instruction set. This provides 128-bits of data bandwidth per cycle.
This double-word load capability allows multiple operands to be loaded into the register file for
32-bit floating point instructions. Unlike other floating point architectures the C67x had
independent control of the its two floating point multipliers and its two the floating point ALUs.
This enables the CPU to operate on a broader mix of floating point algorithms rather than to be
tied to the typical multiply-accumulate oriented functions.
2.3
Load/Store Architecture
Another key feature of the C67x CPU is the load/store architecture, where all instructions
operate on registers (as opposed to directly on data in memory). Two sets of data-addressing
units are responsible for all data transfers between the register files and the memory. The data
address driven by the .D units allows data addresses generated from one register file to be used
to load or store data to or from the other register file.
2.4
Benchmark Performance
Table 1 shows the TMSC32067x CPU floating-point benchmark performance of some algorithms
commonly used in audio applications. The times for each benchmark are listed for a 225 MHz
C6713 CPU.