Introduction
1-16
Copyright © 2005-2008 ARM Limited. All rights reserved.
ARM DDI 0337G
Non-Confidential
Unrestricted Access
The following scenarios show how you can use branch forwarding and the
BRCHSTAT
control to get the best performance from your memory system. The scenarios focus on
the ideal Harvard setup, where instructions execute from ICODE, literals execute from
DCODE (unified to ICODE), and stack/heap/application data executes from SYSTEM.
•
Zero waitstate
•
Zero waitstate, registered fetch interface (ICODE)
•
One wait state flash
•
One wait state flash, registered fetch interface (ICODE)
•
Two wait states flash
on page 1-17.
1.5.1
Zero waitstate
Branch prediction provides approximately 10% gain over not having the feature, and
except for extreme cases, the processor has all the benefits of 100% branch prediction
but with no penalty from branch speculation.
1.5.2
Zero waitstate, registered fetch interface (ICODE)
Branch forwarding results in more aggressive timing on the ICODE interface. If this bus
is a critical path in the system, the ICODE interface might be registered. To avoid an
approximate 25% penalty of adding a wait state, you can add a circuit that acts as a
single-entry prefetcher.
1.5.3
One wait state flash
Adding wait states to the flash impacts performance of any core. You can use a cache to
lessen this penalty, but this has a dramatic effect on determinism and silicon area. A line
prefetcher with two line entries can provide comparable performance to a cache using
many less gates. 128-bits is a common prefetch width for ARM7 targets because of the
32-bit instruction set. The processor has the benefit of Thumb 32-bit instructions, a
mixed 16/32-bit instruction set. This means that a 64-bit prefetch width provides
comparable benefits to a 128-bit interface.
1.5.4
One wait state flash, registered fetch interface (ICODE)
If the ICODE interface must be registered, you can reduce the cost of mispredictions to
only the slave side of the prefetch controller. The core still loses the opportunity of the
fetch queue request on the ICODE interface, as in the zero wait state case. However, the
trailing registered
BRCHSTAT[3]
status of the conditional execution can mask the
external mispredict on the output of the controller's registered system interface,
appearing as an idle cycle.