ARM11
Hisilicon SD5113 (ARM11) 530 MHz, 16-bit DDR2-667, Huawei EchoLife HG8245 GPON Terminal.
- ARMv6 architecture.
- L1 Data cache = 16 KB. 32 B/line, 4-WAY.
- L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.
- L1 TLB size = 10 items (Micro-TLB), fully associative.
- L2 TLB size = 64 items (Main TLB), 2-WAY.
- Single-issue out-of-order-completion CPU.
- Dynamic prediction: BTAC (Branch Target Addresses Cache): 128-entry, direct-mapped, 2-bit saturating prediction history scheme.
BTAC hits enable branch prediction with zero cycle delay.
- Static branch prediction: The processor predicts that all
forward conditional branches are not taken and all backward branches are taken.
- Return stack: 3-entry circular buffer used for the prediction of procedure calls
and procedure returns. Only unconditional procedure returns are predicted.
- Hit-under-miss: When an instruction requests data from a cache,
if the data is not there, ARM11 treats this as a non-blocking operation.
The cache is instructed to get the missing data, then the pipeline execution can continue
as long as the next instructions are not dependent on the missing data. Even if the next
instruction is another data load, the ARM11 microarchitecture permits this operation if
the data is in the cache (i.e. a hit-under-miss). Only if three successive data misses are
encountered, will the pipeline stall.
- The execution of an ALU or MAC
instruction will not be delayed by a waiting LS instruction.
4 KB pages mode
Size |
Latency |
Description |
16 K | 4 | TLB + L1 |
40 K | 4 + 180 ns | + 180 ns (RAM) |
256 K | 12 + 180 ns | + 8 (L1 TLB Miss) |
... | 12 + 580 ns | + 400 ns (L2 TLB Miss) |
- 4-bytes range cross penalty = 1 cycle.
- RAM Read B/W (parallel random read) = 100 ns per one cache line (32 bytes)
- RAM Read B/W (4 Bytes step) = 150 MB/s
- RAM Read B/W (32 Bytes step) = 280 MB/s
- RAM Write B/W (4 Bytes step) = 110 MB/s (no write allocate)
Pipeline
Branch misprediction penalty = 6 cycles.
# |
Stage |
L/S |
Description |
1 | Fe1 | | Instruction fetch + dynamic branch prediction |
2 | Fe2 | |
3 | De | | Decode + static branch prediction + Return Stack |
4 | Iss | | Unstruction issue + Register read |
5 | Sh | ADD | Shifter / Address generation |
6 | ALU | DC1 | Main integer operation calculation / First stage of data cache access |
7 | Sat | DC2 | Saturation of integer results / Second stage of data cache access |
8 | WBex | WBls | Write back |
Links
ARM11 at Wikipedia
The ARM11 Microarchitecture. David Cormie
ARM1136JF-S and ARM1136J-S. Technical Reference Manual