MIPS 74K
Atheros AR9344 (MIPS 74K), 560MHz, 128 MB (16-bit DDR2-667D x 2). TP-Link WDR3600.
- L1 Data cache = 32 KB. 32 B/line. 4-way. Write allocate
- DTLB size = 32 items. (2 pages per item),
- L1 Data cache latency = 4 cycles.
- MIPS ISA doesn't support complex address modes in LOAD instruction. The latence for LOAD from integer array (n=p[n]) is 7 cycles.
- RAM Latency = 4 cycles + 155 ns (32 cycles + 100 ns ?)
- DTLB miss penalty = 40 cycles + 100 ns ?
4 KB pages
32 K 4 TLB + L1
64 K 4 + 80 ns 80 ns + 150 ns RAM
128 K 4 + 120 ns 40 ns
256 K 4 + 140 ns 20 ns
512 K 24 + 200 ns 20 + 60 ns + 40 + 100 ns (TLB miss)
1 M 34 + 225 ns 10 + 25 ns
2 M 39 + 237 ns 5 + 12 ns
4 M 42 + 246 ns 3 + 9 ns
8 M 44 + 260 ns 2 + 14 ns
16 M 44 + 290 ns 30 ns + ??? ns (Page walk)
32 M 44 + 340 ns 50 ns
64 M 44 + 370 ns 30 ns
16 KB pages
Size Latency Increase Description
32 K 4 TLB + L1
64 K 4 + 80 ns 80 ns + 155 ns RAM
128 K 4 + 120 ns 40 ns
256 K 4 + 140 ns 20 ns
512 K 4 + 150 ns 10 ns
1 M 4 + 155 ns 5 ns
2 M 24 + 207 ns 20 + 52 ns + 40 + 100 ns (TLB miss)
4 M 34 + 230 ns 10 + 23 ns
8 M 39 + 243 ns 5 + 13 ns
16 M 43 + 248 ns 3 + 5 ns
32 M 44 + 259 ns 2 + 11 ns
64 M 44 + 294 ns 35 ns + ??? ns (Page walk)
- 4-bytes range cross penalty = 320 cycles
- CPU can't process several TLB misses concurrently.
- L1 B/W (Parallel Random Read) = 1 cycle per one access
- RAM Read B/W (Parallel Random Read) = 44 ns / cache line. (720 MB/S)
- RAM Read B/W (Read, 4 Bytes step) = 200 MB/s
- RAM Read B/W (Read, 32 Bytes step) = 860 MB/s
- RAM Read B/W (Read, 32 Bytes step, pointer-chasing) = 260 MB/s (no hardware prefetch)
- RAM Write (4 Bytes step) = 220 MB/s
- RAM Write (32 Bytes step) = 120 ns per write. Write Allocate? 270 MB/s (32-byte cache line)
Branch misprediction penalty = 10 cycles.
Cache aliasing problem (32 KB data cache, 4-way, 4 KB pages):
There is some penalty for data cache accesses, if there are some
uninitialized data in cache (the data from another process?).
MIPS 74K
- L1 Caches
- 4-way set associative
- 32-byte cache line size
- Virtually indexed, physically tagged
- Cache line locking support
- Up to 4 outstanding I-cache misses
- Virtual tag based hit prediction in data cache
- Up to 4 unique outstanding D-cache misses and 9 total load misses
- Writeback and write-through support in data cache
- Non-blocking data cache prefetches
- L1 Data cache:
- Cache Protocols: uncached, write-back (with write-allocate), write-through (without write-allocate).
- Data cache misses are non-blocking and up to 4 may be outstanding.
- The tag array also has a virtual address portion, which is used to compare against the
virtual address being accessed and generate a data cache hit prediction.
- 64- or 128-bit wide access to the data cache
- L1 Instruction cache.
- 128-bit wide access to the instruction cache
- Instruction cache tag and data access are staggered across 2 cycles,
with up to 4 instructions fetched per cycle.
- Instruction Fetch Unit
- 4-instruction fetch per cycle
- 8-entry Return Prediction Stack
- Combined Majority Branch Predictor using three 256-entry Branch History Tables (BHT)
- 64-entry (4-way) jump register cache to predict target for indirect jumps
- Hardware prefetching of the next 1 or 2 sequential cache lines on a miss.
- In the MIPS16e mode, the IFU takes an additional 3 stages to recode and expand the compressed code.
- Combined majority branch predictor using three 256-entry BHT; 8-entry return prediction stack
- Dual Out-of-Order Instruction Issue
- 12-stage ALU fetch and execution pipe. The latency of the ALU operation is 1 or 2 cycles.
- 13-stage AGEN fetch and execution pipe. AGEN pipe executes load/store and control
transfer instructions
- Common 2-stage graduation pipe
- 32 (18 ALU, 14 AGEN) completion buffers hold execution results until instructions
are graduated in program order
- 12-entry Instruction Buffer to decouple the instruction fetch from execution.
Up to 4 instructions can be written into this buffer,
but a maximum of 2 instructions can be read from this buffer by the IDU.
- Up to 4 instructions issued per cycle in 74Kf core with dual issue FPU
- Programmable Memory Management Unit
- 16/32/48/64 dual-entry, dual-ported TLB shared by Instruction and Data MMU
- 4-entry ITLB (4KB, 16KB page size)
- 4K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M byte page size supported in JTLB
- TLB: 2 virtual pages (odd and even) per entry. dual-ported TLB shared by Instruction and Data MMU.
- 4-entry ITLB (4KB, 1MB page size)
Integer pipeline:
Unit |
# |
Stage |
Name |
Description |
Fetch (IFU) |
1 |
IT | Instruction Tag Read |
I-cache tag arrays accessed
Branch History Table, JRC accessed
ITLB address translation performed
Instruction watch and EJTAG break comparesdone
|
2 |
ID | Instruction Data Read |
I-cache data array accesses
Tag compare, Detect I-cache hit
|
3 |
IS | Instruction Select |
Way select
Target calculation start
|
4 |
IB | Instruction Buffer |
Instruction Buffer write
Target calculation done
|
Decode & Despatch (IDU) |
5 |
DD | Decode |
Access Rename Map, get source register availability to resolve source dependency
Decode instructions and assign pipe and instruction identifier
Check execution resources
|
6 |
DR | Rename |
Update Rename Map at destination register to resolve output dependency
Send instruction information to Graduation Unit (GRU)
Send instruction to Decode and Dispatch Queue (DDQ)
|
7 |
DS | Select for Dispatch |
Check for operand and resource availability and mark valid instructions as ready for dispatch
Select 1 out of 8 (6-entry DDQ + 2 staging registers) ready instructions in each ALU and AGEN pipe independently
|
8 |
DM | Instruction Mux |
Read out the selected instruction from the previous stage and update the selection information
Generate controls for source-operand bypass mux
ALU pipe will start premuxing operands based on the selected instruction
AGEN pipe will starting reading source operands from Register File and Completion Buffers.
|
ALU |
9 | AF | ALU Register File Read | |
10 | AM | ALU Operand Mux | |
11 | AC | ALU Compute | |
12 | AB | ALU Results Bypass | |
Graduation Unit (GRU) |
13 | WB | Writeback | |
14 | WC | Graduation Complete | |
Links
MIPS32 74K