TLB L1 size = 32 items. Miss penalty = 5. Only two misses at time can be processed.
TLB L2 size = 512 items. Miss penalty = 25. Only one miss at time can be processed.
PDE cache size = 24 entries cover 48 MB. Miss penalty = 30
Size | Latency | Description |
---|---|---|
64 K | 3 | TLB + L1 |
128 K | 12 | +9 (L2) |
512 K | 17 | +5 (L1 TLB miss) |
2 M | 17 + 71 ns | +RAM |
48 M | 42 + 71 ns | +25 (L1 TLB miss) |
... | 72 + 71 ns | +30 (PDE cache miss) |
TLB size = 8 items (cover 16 MB). Miss penalty = 30. Only one miss at time can be processed.
Size | Latency | Description |
---|---|---|
64 K | 3 | TLB + L1 |
512 K | 12 | +9 (L2) |
16 M | 12 + 71 ns | + RAM |
... | 42 + 71 ns | +30 (TLB miss) |
8-bytes range cross penalty = 3 cycles.
L1 B/W (Parallel Random Read) = 0.55-0.60 cycles per one access (it's more than 0.50 due bank conflicts)
L2<->L1 B/W (Parallel Random Read) = 9 cycles per cache line in each direction
L2<->L1 B/W (Seqential Read or Write with 8-128 bytes stride) = 10-13 cycles per cache line in each direction
L2<->L1 B/W (Seqential Read 4 bytes pointer chasing) = 3.55 cycles per access (0.55 cycles bytes penalty = 9 cycles / per cache line - NO hardware prefetch to L1)
RAM Read B/W (Parallel Random Read) = 19 ns / cache line = 3400 MB/s
RAM Read B/W (Read with 8 Bytes stride) = 3000 MB/s ?
RAM Read B/W (Read with 64 Bytes stride) = 5050 MB/s ?
RAM Read B/W (Read with 4 Bytes stride - pointer chasing) = 2100 MB/s ?
RAM Read B/W (Read with 64 Bytes stride - pointer chasing) = 2600 MB/s ?
RAM Write B/W (4 Bytes stride) = 2000 MB/s ?
RAM Write B/W (64 Bytes stride) = 2500 MB/s ?
Branch misprediction penalty = 12 cycles.