background image

SPRA921

5

 TMS320C6713 Digital Signal Processor Optimized for High Performance Multichannel Audio Systems

2

C67x CPU and Instruction Set

The TMS320C6713 floating-point digital signal processor uses the C67x VelociTI advanced
very-long instruction words (VLIW) CPU. The CPU fetches (256 bits wide) to supply up to eight
32-bit instructions to the eight functional units during every clock cycle. The VelociTI VLIW
architecture also features variable-length execute packets; these variable-length execute
packets are a key memory-saving feature, distinguishing the C67x CPU from other VLIW
architectures.

Operating at 225 MHz, the TMS320C6713 delivers up to 1350 million floating-point operations
per second (MFLOPS), 1800 million instructions per second (MIPS), and with dual
fixed-floating-point multipliers up to 450 million multiply-accumulate operations per second
(MMACS).

2.1

Functional Units

The CPU features eight of functional units supported by 32 32-bit general purpose registers.
This data path is divided into two symmetric sides consisting of 16 registers and 4 functional
units each. Additionally, each side features a data bus connected to all the registers on the other
side, by which the two sets of functional units can access data from the register files on the
opposite side.

2.2

Fixed and Floating Point Instruction Set

The C67x CPU executes the C62x integer instruction set. In addition, the C67x CPU natively
supports IEEE 32-bit single precision and 64-bit double precision floating point. In addition to
C62x fixed-point instructions, six out of the eight functional units also execute floating-point
instructions: two multipliers, two ALUs, and two auxiliary floating point units. The remaining two
functional units support floating point by providing address generation for the 64-bit loads the
C67x CPU adds to the C62x instruction set. This provides 128-bits of data bandwidth per cycle.
This double-word load capability allows multiple operands to be loaded into the register file for
32-bit floating point instructions. Unlike other floating point architectures the C67x had
independent control of the its two floating point multipliers and its two the floating point ALUs.
This enables the CPU to operate on a broader mix of floating point algorithms rather than to be
tied to the typical multiply-accumulate oriented functions.

2.3

Load/Store Architecture

Another key feature of the C67x CPU is the load/store architecture, where all instructions
operate on registers (as opposed to directly on data in memory). Two sets of data-addressing
units are responsible for all data transfers between the register files and the memory. The data
address driven by the .D units allows data addresses generated from one register file to be used
to load or store data to or from the other register file.

2.4

Benchmark Performance

Table 1 shows the TMSC32067x CPU floating-point benchmark performance of some algorithms
commonly used in audio applications. The times for each benchmark are listed for a 225 MHz
C6713 CPU.

Summary of Contents for TMS320C6713

Page 1: ... existing digital audio formats and the flexibility to add future formats This paper will describe the following parts of the TMS32C6713 processor and their impact on high performance multichannel audio systems The external peripheral architecture The C67x CPU architectural features and performance The real time two level cache architecture The multichannel audio serial ports McASPs Contents 1 Int...

Page 2: ... Microsoft Windows Media Meridian Lossless Packing MLP DVD Audio Rich Music Format RMF In addition to consumer standards many companies are developing their own high performance multichannel audio applications Digital technology is being applied to large venues such as stadiums auditoriums and movie theaters to tune the listening experience to the room acoustics Audio broadcast production and reco...

Page 3: ... a cycle by cycle basis avoiding dead time of most DMAs when a higher priority transfer interrupts a lower priority one Highly configurable PLL and clocking control logic to enable a variety of ratios of system and CPU clocks 256K bytes of internal memory to provide a large internal program and data store Two multichannel buffered serial ports McBSPs provide general connection to multiple serial s...

Page 4: ...eams A D converters DIR SPDIF receivers McASP port 0 McASP port 1 Figure 2 Generalized High Performance Multichannel Audio System McBSP0 McBSP1 McASP0 McASP1 32 EMIF I2C1 I2C0 Timer 1 Timer 0 32 HPI GRO Enhanced DMA controller 16 L2Cache memory 4 banks 64K bytes total up to L2 memory 192K bytes channel Clock generator oscillator and PLL x4 through x25 multiplier 1 through 32 dividers 4 way L1D cac...

Page 5: ...natively supports IEEE 32 bit single precision and 64 bit double precision floating point In addition to C62x fixed point instructions six out of the eight functional units also execute floating point instructions two multipliers two ALUs and two auxiliary floating point units The remaining two functional units support floating point by providing address generation for the 64 bit loads the C67x CP...

Page 6: ...ociativity to handle multiple types of data The second level L2 consists of a total of 256K bytes of memory 64K bytes of this can be configured in one of five ways 64K 4 way associative cache 48K 3 way associative cache 16K mapped RAM 32K 2 way associative cache 32K mapped RAM 16K direct mapped associative cache 48K mapped RAM 64K Mapped RAM Dedicated L1 caches eliminate conflicts for the memory r...

Page 7: ...errupt frequency has not increased in proportion to the increase in device operation frequency As processing speeds have increased latency requirements have not The TMS320C6713 is capable of servicing interrupts with a latency of a fraction of a microsecond when the service routine is located in external memory By configuring the L2 memory blocks as memory mapped SRAM or by using the L2 memory map...

Page 8: ...d to operate as either transmit data receive data or general purpose I O GPIO The transmit section of the McASP can transmit data in either a time division multiplexed TDM synchronous serial format or in a digital audio interface DIT format where the bit stream is encoded for S PDIF AES 3 IEC 60958 CP 430 transmission The receive section of the McASP supports the TDM synchronous serial format Each...

Page 9: ...e configured in digital audio interface transmitter DIT mode where it outputs data formatted for transmission over an S PDIF AES 3 IEC 60958 or CP 430 standard link These standards encode the serial data such that the equivalent of clock and frame sync are embedded within the data stream DIT transfer mode is used as an interconnect between audio components and can transfer multichannel digital aud...

Page 10: ...SCLK2 clock cycles The timer value can be read to get a measurement of the high frequency master clock frequency and has a min max range setting that can raise an error flag if the high frequency master clock goes out of a specified range Upon the detection of any one or more of the above errors software selectable or the assertion of the AMUTE_IN pin the AMUTE output pin may be asserted to a high...

Page 11: ...report SPRA472 3 TMS320C6000 DSP Multichannel Audio Serial Port McASP Reference Guide SPRU041 4 TMS320C621x C671x Two Level Internal Memory Reference Guide SPRU609 5 TMS320C6000 CPU and Instruction Set Reference Guide SPRU189 6 TMS320C6000 Peripherals Reference Guide SPRU190 7 Payan Reimi DSP software and hardware trade offs in Professional Audio Applications Audio Engineering Society 112th Conven...

Page 12: ...tute a license from TI to use such products or services or a warranty or endorsement thereof Use of such information may require a license from a third party under the patents or other intellectual property of the third party or a license from TI under the patents or other intellectual property of TI Reproduction of information in TI data books or data sheets is permissible only if reproduction is...

Reviews: