A6.5
About data prefetching
This section describes the software and hardware data prefetching behavior for the processor.
Preload instructions
PLD
instructions in AArch32, and
PRFM
instructions of type
PLD
in AArch64, look up in the cache and
start a linefill if they miss and are to a cacheable address. These instructions retire as soon as their linefill
has started, they do not wait for data to be returned. This enables other instructions to execute while the
linefill continues in the background.
PLDW
instructions in AArch32, and
PRFM
instructions of type
PST
in AArch64, are similar to
PLD
, except
that if they miss, the linefill causes data to be invalidated in other cores and masters so that the line is
ready for writing.
PRFM
instructions also enable targeting of a prefetch to the L2 cache. When this is the case, a request is
sent to the L2 memory system to start a linefill. The instruction then retires without any data being
returned to the L1 memory system.
PLI
instructions in AArch32, and
PRFM
instructions of type
PLI
in AArch64, are treated as NOPs.
Automatic data prefetching and monitoring
The L1 data-side memory system implements an automatic prefetcher that monitors cache misses in the
core. When a pattern is detected, the automatic prefetcher starts linefills in the background. The
prefetcher recognizes a sequence of data cache misses at a fixed stride pattern that lies in four cache
lines, plus or minus. Any intervening stores or loads that hit in the data cache do not interfere with the
recognition of the cache miss pattern.
The CPUACTLR enables you to:
• Deactivate the prefetcher.
• Alter the sequence length required to trigger the prefetcher.
• Alter the number of outstanding requests that the prefetcher can make.
Use
PLD
or
PRFM
instructions for data prefetching where short sequences or irregular pattern fetches are
required.
Non-temporal loads
Cache requests made by a non-temporal load instruction (
LDNP
) are allocated to the L2 cache only. The
allocation policy makes it likely that the line is replaced sooner than other lines.
Data Cache Zero
The Data Cache Zero by Virtual Address (
DC ZVA
) instruction enables a block of 64 bytes in memory,
aligned to 64 bytes in size, to be set to zero. If the
DC ZVA
instruction misses in the cache, it clears main
memory, without causing an L1 or L2 cache allocation.
Related information
B2.36 CPU Auxiliary Control Register, EL1
on page B2-412
A6 L1 Memory System
A6.5 About data prefetching
100236_0100_00_en
Copyright © 2015–2017, 2019 Arm Limited or its affiliates. All rights
reserved.
A6-95
Non-Confidential
Summary of Contents for Cortex-A35
Page 4: ......
Page 18: ......
Page 26: ......
Page 27: ...Part A Functional Description ...
Page 28: ......
Page 145: ...Part B Register Descriptions ...
Page 146: ......
Page 573: ...Part C Debug ...
Page 574: ......
Page 845: ...Part D Appendices ...
Page 846: ......