AMD K8 (Athlon 64)

Configuration

AMD Athlon 64 X2 3800+ (90 nm) 2000 MHz + dual DDR-400 PC-3200 3-3-3-8-11-16-2T

4 KB pages mode (64-bit Windows, 64-bit soft)

TLB L1 size = 32 items. Miss penalty = 5. Only two misses at time can be processed.

TLB L2 size = 512 items. Miss penalty = 25. Only one miss at time can be processed.

PDE cache size = 24 entries cover 48 MB. Miss penalty = 30

Size Latency Description
64 K 3 TLB + L1
128 K 12 +9 (L2)
512 K 17 +5 (L1 TLB miss)
2 M 17 + 71 ns +RAM
48 M 42 + 71 ns +25 (L1 TLB miss)
... 72 + 71 ns +30 (PDE cache miss)

2 MB pages mode (64-bit)

TLB size = 8 items (cover 16 MB). Miss penalty = 30. Only one miss at time can be processed.

Size Latency Description
64 K 3 TLB + L1
512 K 12 +9 (L2)
16 M 12 + 71 ns + RAM
... 42 + 71 ns +30 (TLB miss)

MISC

8-bytes range cross penalty = 3 cycles.

L1 B/W (Parallel Random Read) = 0.55-0.60 cycles per one access (it's more than 0.50 due bank conflicts)

L2<->L1 B/W (Parallel Random Read) = 9 cycles per cache line in each direction

L2<->L1 B/W (Seqential Read or Write with 8-128 bytes stride) = 10-13 cycles per cache line in each direction

L2<->L1 B/W (Seqential Read 4 bytes pointer chasing) = 3.55 cycles per access (0.55 cycles bytes penalty = 9 cycles / per cache line - NO hardware prefetch to L1)

RAM Read B/W (Parallel Random Read) = 19 ns / cache line = 3400 MB/s

RAM Read B/W (Read with 8 Bytes stride) = 3000 MB/s ?

RAM Read B/W (Read with 64 Bytes stride) = 5050 MB/s ?

RAM Read B/W (Read with 4 Bytes stride - pointer chasing) = 2100 MB/s ?

RAM Read B/W (Read with 64 Bytes stride - pointer chasing) = 2600 MB/s ?

RAM Write B/W (4 Bytes stride) = 2000 MB/s ?

RAM Write B/W (64 Bytes stride) = 2500 MB/s ?

Branch misprediction penalty = 12 cycles.