Configuration: IBM Power System LC922: IBM POWER9 3800 MHz (Nimbus), 2 sockets, 16 cores per chip, 4 threads per core), 256 GB (DDR4-2667).
Notes:
Power9 system now uses mitigations for Spectre/Meltdown vulnerabilities. One mitigation enables "L1D private per thread" mode. Each cache line in L1D has 2-bit thread ID. Only one thread can read data from L1D line in "L1D private" mode. If another thread reads data from such line, the line will be reloaded from L2 cache, so the access to shared data in L1D is 16 cycles instead of 4 cycles for 3 another threads from 4 SMT threads. And there is performance degradation, if threads use same data, for example, if there are some common tables with constants, and all threads read data from these tables.
Size Latency Increase Description 32 K 4 64 K 10 6 + 12 (L2) 128 K 13 3 256 K 14 1 512 K 15 1 1 M 26 11 + 21 (L3) 2 M 31 5 4 M 34 3 8 M 36 2 + ns 16 M 37 + 28 ns 1 + 28 ns + 64 ns (RAM) 32 M 37 + 29 ns + 1 ns 64 M 37 + 46 ns + 17 ns 128 M 47 + 58 ns 10 + 12 ns + 20 (ERAT miss) 256 M 52 + 62 ns 5 + 4 ns 512 M 55 + 63 ns 3 + 1 ns 1 G 56 + 64 ns 1 + 1 ns 2 G 75 + 64 ns 19 + ns + 36 (TLB miss) 4 G 84 + 64 ns 9 + ns 8 G 89 + 64 ns 5 + ns 16 G 91 + 64 ns 2 + ns 32 G 94 + 64 ns 3 + ns 64 G 98 + 64 ns 4 +
Size Latency Increase Description 32 K 4 64 K 10 6 + 12 (L2) 128 K 13 3 256 K 14 1 512 K 15 1 1 M 26 11 + 21 (L3) 2 M 31 5 4 M 34 3 8 M 46 + 1 ns 12 + 1 ns + 20 (ERAT miss) 16 M 52 + 25 ns 6 + 24 ns + 64 ns (RAM) 32 M 55 + 29 ns 3 + 4 ns 64 M 74 + 44 ns 19 + 15 ns + 36 (TLB miss) 128 M 84 + 57 ns 10 + 13 ns 256 M 89 + 62 ns 5 + 5 ns 512 M 97 + 63 ns 8 + 1 ns 1 G 117 + 64 ns 20 + 1 ns 2 G 129 + 64 ns 12 + ns 4 G 135 + 64 ns 14 + ns 8 G 140 + 64 ns 5 + ns 16 G 162 + 64 ns 22 + ns 32 G 265 + 64 ns 103 + ns 64 G 351 + 64 ns 86
Notes:
7z b -mm=* : MIPS and Effectiveness values are normalized with AMD K8 cpu.
Notes: "L1D private per thread" mitigation affects AES256CBC and CRC multi-threading results. We test also special version of p7zip where each thread has own copy of common data tables.
### clang-7 -O3
## THP off, MY_CPU_LE_UNALIGN
# numactl -m0 -C28-31
freq= 3800
LE
CPU Freq: 1892 1893 1894 1893 1893 1894 1894 1894 1894
RAM size: 257614 MB, # CPU hardware threads: 4
RAM usage: 225 MB, # Benchmark threads: 1
Method Speed Usage R/U Rating E/U Effec
KiB/s % MIPS MIPS % %
CPU 100 1893 1891 50 50
CPU 100 1888 1890 50 50
CPU 100 1893 1893 50 50
LZMA:x1 13448 100 4930 4916 130 129
36461 100 2964 2969 78 78
LZMA:x5:mt1 3632 100 4538 4538 119 119
36500 100 3078 3079 81 81
LZMA:x5:mt2 4785 163 3675 5979 97 157
36536 100 3085 3082 81 81
Deflate:x1 38635 100 4903 4906 129 129
101874 100 3168 3165 83 83
Deflate:x5 11605 100 4458 4468 117 118
102011 100 3172 3167 83 83
Deflate:x7 4085 100 4527 4526 119 119
102711 100 3184 3187 84 84
Deflate64:x5 10844 100 4689 4686 123 123
102011 100 3196 3191 84 84
BZip2:x1 6207 100 3764 3751 99 99
27636 100 2991 2996 79 79
BZip2:x5 5697 100 4748 4755 125 125
25145 100 4947 4935 130 130
BZip2:x5:mt2 7673 196 3271 6404 86 169
29969 141 4176 5882 110 155
BZip2:x7 1783 100 4619 4620 122 122
25316 100 4962 4965 131 131
PPMD:x1 4530 100 4681 4685 123 123
3535 100 4167 4163 110 110
PPMD:x5 3526 100 5984 5976 157 157
2724 100 5100 5105 134 134
Delta:4 575429 100 3535 3535 93 93
563837 100 3470 3464 91 91
BCJ 1327653 100 5429 5438 143 143
1331109 100 5468 5452 144 143
AES256CBC:1 188021 100 4620 4621 122 122
207201 100 5084 5092 134 134
AES256CBC:2
CRC32:1 307853 100 2242 2241 59 59
CRC32:4 879880 100 1964 1964 52 52
CRC32:8 1194732 100 1620 1620 43 43
CRC64 805290 100 1650 1649 43 43
SHA256 206928 100 4222 4221 111 111
SHA1 429681 100 4021 4022 106 106
BLAKE2sp 280716 100 6178 6176 163 163
CPU 100 1891 1892 50 50
------------------------------------------------------
Tot: 109 3840 4161 101 110
RAM usage: 901 MB, # Benchmark threads: 4
Method Speed Usage R/U Rating E/U Effec
KiB/s % MIPS MIPS % %
CPU 395 1727 6822 45 180
CPU 392 1715 6721 45 177
CPU 396 1736 6874 46 181
LZMA:x1 29939 399 2743 10945 72 288
88856 399 1812 7236 48 190
LZMA:x5:mt1 7027 398 2204 8779 58 231
86137 399 1820 7264 48 191
LZMA:x5:mt2 7276 396 2295 9091 60 239
85371 399 1804 7199 47 189
Deflate:x1 72464 399 2305 9201 61 242
213091 399 1658 6621 44 174
Deflate:x5 21824 398 2111 8403 56 221
213482 399 1660 6627 44 174
Deflate:x7 7258 400 2013 8042 53 212
214428 399 1669 6654 44 175
Deflate64:x5 20815 398 2260 8995 59 237
213066 400 1668 6665 44 175
BZip2:x1 11327 400 1711 6844 45 180
55482 400 1505 6015 40 158
BZip2:x5 9783 399 2048 8165 54 215
49215 398 2425 9660 64 254
BZip2:x5:mt2 9870 396 2082 8238 55 217
48835 384 2498 9585 66 252
BZip2:x7 3066 399 1991 7945 52 209
49382 397 2439 9684 64 255
PPMD:x1 7374 399 1911 7627 50 201
6165 399 1818 7260 48 191
PPMD:x5 5428 400 2303 9201 61 242
4636 399 2177 8689 57 229
Delta:4 708191 400 1089 4351 29 115
686198 399 1056 4216 28 111
BCJ 1672737 400 1714 6852 45 180
1753549 400 1797 7183 47 189
AES256CBC:1 117317 399 723 2883 19 76
123355 399 761 3032 20 80
AES256CBC:2
CRC32:1 625149 399 1142 4551 30 120
CRC32:4 1444809 400 807 3225 21 85
CRC32:8 1673850 399 570 2270 15 60
CRC64 1477231 400 757 3025 20 80
SHA256 261507 399 1337 5335 35 140
SHA1 666711 400 1561 6240 41 164
BLAKE2sp 426756 400 2349 9389 62 247
CPU 395 1756 6934 46 182
------------------------------------------------------
Tot: 398 1941 7728 51 203
### clang-7 -O3
## THP off, thread-local storage hack (aes, blake2sp, crc, deflate, sha256) + MY_CPU_LE_UNALIGN
# numactl -m0 -C28-31
RAM usage: 901 MB, # Benchmark threads: 4
Method Speed Usage R/U Rating E/U Effec
KiB/s % MIPS MIPS % %
CPU 396 1729 6855 46 180
CPU 393 1713 6723 45 177
CPU 396 1726 6829 45 180
LZMA:x1 29998 398 2754 10966 72 289
88450 399 1805 7203 47 190
LZMA:x5:mt1 6443 399 2019 8049 53 212
86521 399 1827 7296 48 192
LZMA:x5:mt2 7136 387 2306 8916 61 235
86474 399 1828 7292 48 192
Deflate:x1 72690 399 2315 9230 61 243
219825 399 1713 6830 45 180
Deflate:x5 22128 399 2134 8520 56 224
220260 399 1715 6838 45 180
Deflate:x7 7336 400 2035 8129 54 214
221478 399 1723 6873 45 181
Deflate64:x5 21106 399 2285 9121 60 240
219763 399 1723 6874 45 181
BZip2:x1 11345 400 1714 6854 45 180
55400 400 1503 6006 40 158
BZip2:x5 10036 396 2117 8376 56 220
49172 398 2424 9652 64 254
BZip2:x5:mt2 9768 400 2039 8152 54 215
48410 383 2480 9502 65 250
BZip2:x7 3057 400 1983 7921 52 208
49573 399 2439 9722 64 256
PPMD:x1 7351 399 1904 7603 50 200
6106 399 1801 7191 47 189
PPMD:x5 5408 399 2295 9166 60 241
4522 393 2154 8475 57 223
Delta:4 713635 400 1097 4385 29 115
702532 400 1080 4316 28 114
BCJ 1755966 400 1799 7192 47 189
1748352 399 1794 7161 47 188
AES256CBC:1 254067 397 1573 6244 41 164
271973 400 1673 6684 44 176
AES256CBC:2
CRC32:1 1136614 400 2071 8275 54 218
CRC32:4 2859582 400 1597 6383 42 168
CRC32:8 2981125 400 1011 4042 27 106
CRC64 2429159 399 1246 4975 33 131
SHA256 262536 400 1340 5356 35 141
SHA1 665986 400 1559 6234 41 164
BLAKE2sp 449079 400 2471 9880 65 260
CPU 395 1752 6916 46 182
------------------------------------------------------
Tot: 397 1981 7862 52 207