...the world's most energy friendly microcontrollers
2014-07-02 - Tiny Gecko Family - d0034_Rev1.20
34
www.silabs.com
7.3.4.4 Cortex-M3 If-Then Block Folding
The Cortex-M3 offers a mechanism known as if-then block folding. This is a form of speculative
prefetching where small if-then blocks are collapsed in the prefetch buffer if the condition evaluates to
false. The instructions in the block then appear to execute in zero cycles. With this scheme, performance
is optimized at the cost of higher energy consumption as the processor fetches more instructions from
memory than it actually executes. To disable the mode, write a 1 to the DISFOLD bit in the NVIC Auxiliary
Control Register; see the Cortex-M3 Technical Reference Manual for details. Normally, it is expected
that this feature is most efficient at core frequencies above 16 MHz. Folding is enabled by default.
7.3.4.5 Instruction Cache
The MSC includes an instruction cache. The instruction cache for the internal flash memory is enabled
by default, but can be disabled by setting IFCDIS in MSC_READCTRL. When enabled, the instruction
cache typically reduces the number of flash reads significantly, thus saving energy. In most cases a
cache hit-rate of more than 70 % is achievable. When a 32-bit instruction fetch hits in the cache the data
is returned to the processor in one clock cycle. Thus, performance is also improved when wait-states
are used (i.e. running at frequencies above 16 MHz).
The instruction cache is connected directly to the Cortex-M3 and functions as a memory access filter
between the processor and the memory system, as illustrated in Figure 7.1 (p. 34) . The cache
consists of an access filter, lookup logic, a 128x32 SRAM (512 bytes) and two performance counters.
The access filter checks that the address for the access is of an instruction in the code space (instructions
in RAM outside the code space are not cached). If the address matches, the cache lookup logic and
SRAM is enabled. Otherwise, the cache is bypassed and the access is forwarded to the memory system.
The cache is then updated when the memory access completes. The access filter also disables cache
updates for interrupt context accesses if caching in interrupt context is disabled. The performance
counters, when enabled, keep track of the number of cache hits and misses. The cache consists of 16
8-word cachelines organized as 4 sets with 4 ways. The cachelines are filled up continuously one word
at a time as the individual words are requested by the processor. Thus, not all words of a cacheline
might be valid at a given time.
Figure 7.1. Instruction Cache
Cort ex
128x 32
SRAM
Access
Filt er
Cache
Look- up Logic
ICODE
AHB- Lit e Bus
ICODE
AHB- Lit e Bus
CODE
Mem ory Space
Inst ruct ion Cache
Perform ance Count ers
DCODE
AHB- Lit e Bus
IDCODE
AHB- Lit e Bus
IDCODE
MUX
By default, the instruction cache is automatically invalidated when the contents of the flash is changed
(i.e. written or erased). In many cases, however, the application only makes changes to data in the
flash, not code. In this case, the automatic invalidate feature can be disabled by setting AIDIS in
MSC_READCTRL. The cache can (independent of the AIDIS setting) be manually invalidated by writing
1 to INVCACHE in MSC_CMD.
In general it is highly recommended to keep the cache enabled all the time. However, for some sections
of code with very low cache hit-rate more energy-efficient execution can be achieved by disabling the
cache temporarily. To measure the hit-rate of a code-section, the built-in performance counters can
be used. Before the section, start the performance counters by writing 1 to STARTPC in MSC_CMD.
This starts the performance counters, counting from 0. At the end of the section, stop the performance
counters by writing 1 to STOPPC in MSC_CMD. The number of cache hits and cache misses for
that section can then be read from MSC_CACHEHITS and MSC_CACHEMISSES respectively. The
Summary of Contents for EFM32TG
Page 543: ......