Developer’s Manual
January, 2004
179
Intel XScale® Core
Developer’s Manual
Optimization Guide
A.2.2
Instruction Flow Through the Pipeline
The Intel XScale
®
core pipeline issues a single instruction per clock cycle. Instruction execution
begins at the F1 pipestage and completes at the WB pipestage.
Although a single instruction may be issued per clock cycle, all three pipelines (MAC, memory,
and main execution) may be processing instructions simultaneously. If there are no data hazards,
then each instruction may complete independently of the others.
Each pipestage takes a single clock cycle or machine cycle to perform its subtask with the
exception of the MAC unit.
A.2.2.1.
ARM* V5TE Instruction Execution
Figure A-1
uses arrows to show the possible flow of instructions in the pipeline. Instruction
execution flows from the F1 pipestage to the RF pipestage. The RF pipestage may issue a single
instruction to either the X1 pipestage or the MAC unit (multiply instructions go to the MAC, while
all others continue to X1). This means that M1 or X1 will be idle.
All load/store instructions are routed to the memory pipeline after the effective addresses have been
calculated in X1.
The ARM V5TE bx (branch and exchange) instruction, which is used to branch between ARM and
THUMB code, causes the entire pipeline to be flushed (The bx instruction is not dynamically
predicted by the BTB). If the processor is in Thumb mode, then the ID pipestage dynamically
expands each Thumb instruction into a normal ARM V5TE RISC instruction and execution
resumes as usual.
A.2.2.2.
Pipeline Stalls
The progress of an instruction can stall anywhere in the pipeline. Several pipestages may stall for
various reasons. It is important to understand when and how hazards occur in the core pipeline.
Performance degradation can be significant if care is not taken to minimize pipeline stalls.