Enhanced Multiply-Accumulate Unit (EMAC)
MCF52235 ColdFire® Integrated Microcontroller Reference Manual, Rev. 6
4-10
Freescale Semiconductor
Although the multiplier array is implemented in a four-stage pipeline, all arithmetic MAC instructions
have an effective issue rate of 1 cycle, regardless of input operand size or type.
All arithmetic operations use register-based input operands, and summed values are stored in an
accumulator. Therefore, an additional MOVE instruction is needed to store data in a general-purpose
register. One new feature in EMAC instructions is the ability to choose the upper or lower word of a
register as a 16-bit input operand. This is useful in filtering operations if one data register is loaded with
the input data and another is loaded with the coefficient. Two 16-bit multiply accumulates can be
performed without fetching additional operands between instructions by alternating word choice during
calculations.
The EMAC has four accumulator registers versus the MAC’s single accumulator. The additional registers
improve the performance of some algorithms by minimizing pipeline stalls needed to store an accumulator
value back to general-purpose registers. Many algorithms require multiple calculations on a given data set.
By applying different accumulators to these calculations, it is often possible to store one accumulator
without any stalls while performing operations involving a different destination accumulator.
The need to move large amounts of data presents an obstacle to obtaining high throughput rates in DSP
engines. Existing ColdFire instructions can accommodate these requirements. A MOVEM instruction can
efficiently move large data blocks by generating line-sized burst references. The ability to load an operand
simultaneously from memory into a register and execute a MAC instruction makes some DSP operations
such as filtering and convolution more manageable.
The programming model includes a mask register (MASK), which can optionally be used to generate an
operand address during MAC + MOVE instructions. The register application with auto-increment
addressing mode supports efficient implementation of circular data queues for memory operands.
4.3.1
Fractional Operation Mode
This section describes behavior when the fractional mode is used (MACSR[F/I] is set).
4.3.1.1
Rounding
When the processor is in fractional mode, there are two operations during which rounding can occur:
1. Execution of a store accumulator instruction (
move.l ACCx,Rx
). The lsbs of the 48-bit accumulator
logic are used to round the resulting 16- or 32-bit value. If MACSR[S/U] is cleared, the low-order
8 bits round the resulting 32-bit fraction. If MACSR[S/U] is set, the low-order 24 bits are used to
round the resulting 16-bit fraction.
2. Execution of a MAC (or MSAC) instruction with 32-bit operands. If MACSR[R/T] is zero,
multiplying two 32-bit numbers creates a 64-bit product truncated to the upper 40 bits; otherwise,
it is rounded using round-to-nearest (even) method.
To understand the round-to-nearest-even method, consider the following example involving the rounding
of a 32-bit number, R0, to a 16-bit number. Using this method, the 32-bit number is rounded to the closest
16-bit number possible. Let the high-order 16 bits of R0 be named R0.U and the low-order 16 bits be R0.L.
•
If R0.L is less than 0x8000, the result is truncated to the value of R0.U.
•
If R0.L is greater than 0x8000, the upper word is incremented (rounded up).
Because
of
an
order
from
the
United
States
International
Trade
Commission,
BGA-packaged
product
lines
and
part
numbers
indicated
here
currently
are
not
available
from
Freescale
for
import
or
sale
in
the
United
States
prior
to
September
2010:MCF52234CVM60,
MCF52235CVM60