3.2
Instruction Fetch Unit
The E31 instruction fetch unit contains branch prediction hardware to improve performance of
the processor core. The branch predictor comprises a 40-entry branch target buffer (BTB) which
predicts the target of taken branches, a 128-entry branch history table (BHT), which predicts the
direction of conditional branches, and a 2-entry return-address stack (RAS) which predicts the
target of procedure returns. The branch predictor has a one-cycle latency, so that correctly pre-
dicted control-flow instructions result in no penalty. Mispredicted control-flow instructions incur a
three-cycle penalty.
The E31 implements the standard Compressed (C) extension to the RISC‑V architecture, which
allows for 16-bit RISC‑V instructions.
3.3
Execution Pipeline
The E31 execution unit is a single-issue, in-order pipeline. The pipeline comprises five stages:
instruction fetch, instruction decode and register fetch, execute, data memory access, and regis-
ter writeback.
The pipeline has a peak execution rate of one instruction per clock cycle, and is fully bypassed
so that most instructions have a one-cycle result latency. There are several exceptions:
• LW has a two-cycle result latency, assuming a cache hit.
• LH, LHU, LB, and LBU have a three-cycle result latency, assuming a cache hit.
• CSR reads have a three-cycle result latency.
• MUL, MULH, MULHU, and MULHSU have a 5-cycle result latency.
• DIV, DIVU, REM, and REMU have between a 2-cycle and 33-cycle result latency, depending
on the operand values.
The pipeline only interlocks on read-after-write and write-after-write hazards, so instructions
may be scheduled to avoid stalls.
The E31 implements the standard Multiply (M) extension to the RISC‑V architecture for integer
multiplication and division. The E31 has a 8-bit per cycle hardware multiply and a 1-bit per cycle
hardware divide. The multiplier can only execute one operation at a time and will block until the
previous operation completes.
The hart will not abandon a Divide instruction in flight. This means if an interrupt handler tries to
use a register that is the destination register of a divide instruction the pipeline stalls until the
divide is complete.
Branch and jump instructions transfer control from the memory access pipeline stage. Correctly-
predicted branches and jumps incur no penalty, whereas mispredicted branches and jumps
incur a three-cycle penalty.
Chapter 3 E31 RISC-V Core
SiFive FE310-G000 Manual: v3p2
© SiFive, Inc.
Page 16