A6.4
L1 data memory system
The L1 data cache is organized as a
Virtually Indexed, Physically Tagged
(VIPT) cache featuring four
ways.
Data cache invalidate on reset
The Armv8-A architecture does not support an operation to invalidate the entire data cache. If
software requires this function, it must be constructed by iterating over the cache geometry and
executing a series of individual invalidate by set/way instructions.
A6.4.1
Memory system implementation
This section describes the implementation of the L1 memory system.
Limited Order Regions
The core offers support for four limited ordering region descriptors, as introduced by the Armv8.1
Limited Ordering Regions.
Atomic instructions
The Cortex-A76 core supports the atomic instructions added in Armv8.1 architecture.
Atomic instructions to cacheable memory can be performed as either near atomics or far atomics,
depending on where the cache line containing the data resides.
When an instruction hits in the L1 data cache in a unique state, then it is performed as a near atomic in
the L1 memory system. If the atomic operation misses in the L1 cache, or the line is shared with another
core, then the atomic is sent as a far atomic on the core CHI interface.
If the operation misses everywhere within the cluster, and the interconnect supports far atomics, then the
atomic is passed on to the interconnect to perform the operation.
When the operation hits anywhere inside the cluster, or when an interconnect does not support atomics,
the L3 memory system performs the atomic operation. If the line it is not already there, it allocates the
line into the L3 cache. This depends on whether the DSU is configured with an L3 cache.
Therefore, if software prefers that the atomic is performed as a near atomic, precede the atomic
instruction with a
PLDW
or
PRFM PSTL1KEEP
instruction.
Alternatively, CPUECTLR can be programmed such that different types of atomic instructions attempt to
execute as a near atomic. One cache fill will be made on an atomic. If the cache line is lost before the
atomic operation can be made, it will be sent as a far atomic.
The Cortex-A76 core supports atomics to device or non-cacheable memory, however this relies on the
interconnect also supporting atomics. If such an atomic instruction is executed when the interconnect
does not support them, it will result in an abort.
For more information on the CPUECTLR register, see
B2.26 CPUECTLR_EL1, CPU Extended Control
.
LDAPR instructions
The core supports Load acquire instructions adhering to the RCpc consistency semantic introduced in the
Armv8.3 extensions for A profile. This is reflected in register ID_AA64ISAR1_EL1 where bits[23:20]
are set to
0b0001
to indicate that the core supports
LDAPRB
,
LDAPRH
, and
LDAPR
instructions implemented
in AArch64.
Transient memory region
The core has a specific behavior for memory regions that are marked as write-back cacheable and
transient, as defined in the Armv8.0 architecture.
A6 Level 1 memory system
A6.4 L1 data memory system
100798_0300_00_en
Copyright © 2016–2018 Arm Limited or its affiliates. All rights
reserved.
A6-77
Non-Confidential
Summary of Contents for Cortex-A76 Core
Page 4: ......
Page 22: ......
Page 23: ...Part A Functional description ...
Page 24: ......
Page 119: ...Part B Register descriptions ...
Page 120: ......
Page 363: ...Part C Debug descriptions ...
Page 364: ......
Page 401: ...Part D Debug registers ...
Page 402: ......
Page 589: ...Part E Appendices ...
Page 590: ......