Instruction Timing
ARM DDI 0337G
Copyright © 2005-2008 ARM Limited. All rights reserved.
18-7
Unrestricted Access
Non-Confidential
18.3
Load-store timings
This section describes how best to pair instructions. This achieves more reductions in
timing.
•
STR Rx,[Ry,#imm] is always one cycle. This is because the address generation is
performed in the initial cycle, and the data store is performed at the same time as
the next instruction is executing. If the store is to the store buffer, and the store
buffer is full, the next instruction is delayed until the store can complete. If the
store is not to the store buffer, such as to the Code segment, and that transaction
stalls, the impact on timing is only felt if another load or store operation is
executed before completion.
•
LDR Rx!,[any] is not normally pipelined. That is, base update load is generally at
least a two-cycle operation (more if stalled). However, if the next instruction does
not require to read from a register, the load is reduced to one cycle. Non register
writing instructions include CMP, TST, NOP, and non-taken IT controlled
instructions.
•
LDR PC,[any] is always a blocking operation. This means minimally two cycles
for the load, and three cycles for the pipeline reload. So at least five cycles (more
if stalled on the load or the fetch).
•
LDR Rx,[PC,#imm] might add a cycle because of contention with the fetch unit.
•
TBB and TBH are also blocking operations. These are minimally two cycles for
the load, one cycle for the add, and three cycles for the pipeline reload. This
means at least six cycles (more if stalled on the load or the fetch).
•
LDR any are pipelined when possible. This means that if the next instruction is
an LDR or non-base updating STR, and the destination of the first LDR is not
used to compute the address for the next instruction, then one cycle is removed
from the cost of the next instruction. So, an LDR might be followed by an STR,
so that the STR writes out what the LDR loaded. More multiple LDRs can be
pipelined together. Some optimized examples:
—
LDR R0,[R1]; LDR R1,[R2] - normally three cycles total
—
LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total
—
LDR R0,[R1,R2]; STR R1,[R3,R2] - normally three cycles total
—
LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4] - normally four cycles
total.