Configuration: IBM Power System LC922: IBM POWER9 3800 MHz (Nimbus), 2 sockets, 16 cores per chip, 4 threads per core), 256 GB (DDR4-2667).
Notes:
Power9 system now uses mitigations for Spectre/Meltdown vulnerabilities. One mitigation enables "L1D private per thread" mode. Each cache line in L1D has 2-bit thread ID. Only one thread can read data from L1D line in "L1D private" mode. If another thread reads data from such line, the line will be reloaded from L2 cache, so the access to shared data in L1D is 16 cycles instead of 4 cycles for 3 another threads from 4 SMT threads. And there is performance degradation, if threads use same data, for example, if there are some common tables with constants, and all threads read data from these tables.
Size Latency Increase Description 32 K 4 64 K 10 6 + 12 (L2) 128 K 13 3 256 K 14 1 512 K 15 1 1 M 26 11 + 21 (L3) 2 M 31 5 4 M 34 3 8 M 36 2 + ns 16 M 37 + 28 ns 1 + 28 ns + 64 ns (RAM) 32 M 37 + 29 ns + 1 ns 64 M 37 + 46 ns + 17 ns 128 M 47 + 58 ns 10 + 12 ns + 20 (ERAT miss) 256 M 52 + 62 ns 5 + 4 ns 512 M 55 + 63 ns 3 + 1 ns 1 G 56 + 64 ns 1 + 1 ns 2 G 75 + 64 ns 19 + ns + 36 (TLB miss) 4 G 84 + 64 ns 9 + ns 8 G 89 + 64 ns 5 + ns 16 G 91 + 64 ns 2 + ns 32 G 94 + 64 ns 3 + ns 64 G 98 + 64 ns 4 +
Size Latency Increase Description 32 K 4 64 K 10 6 + 12 (L2) 128 K 13 3 256 K 14 1 512 K 15 1 1 M 26 11 + 21 (L3) 2 M 31 5 4 M 34 3 8 M 46 + 1 ns 12 + 1 ns + 20 (ERAT miss) 16 M 52 + 25 ns 6 + 24 ns + 64 ns (RAM) 32 M 55 + 29 ns 3 + 4 ns 64 M 74 + 44 ns 19 + 15 ns + 36 (TLB miss) 128 M 84 + 57 ns 10 + 13 ns 256 M 89 + 62 ns 5 + 5 ns 512 M 97 + 63 ns 8 + 1 ns 1 G 117 + 64 ns 20 + 1 ns 2 G 129 + 64 ns 12 + ns 4 G 135 + 64 ns 14 + ns 8 G 140 + 64 ns 5 + ns 16 G 162 + 64 ns 22 + ns 32 G 265 + 64 ns 103 + ns 64 G 351 + 64 ns 86
Notes:
7z b -mm=* : MIPS and Effectiveness values are normalized with AMD K8 cpu.
Notes: "L1D private per thread" mitigation affects AES256CBC and CRC multi-threading results. We test also special version of p7zip where each thread has own copy of common data tables.
### clang-7 -O3 ## THP off, MY_CPU_LE_UNALIGN # numactl -m0 -C28-31 freq= 3800 LE CPU Freq: 1892 1893 1894 1893 1893 1894 1894 1894 1894 RAM size: 257614 MB, # CPU hardware threads: 4 RAM usage: 225 MB, # Benchmark threads: 1 Method Speed Usage R/U Rating E/U Effec KiB/s % MIPS MIPS % % CPU 100 1893 1891 50 50 CPU 100 1888 1890 50 50 CPU 100 1893 1893 50 50 LZMA:x1 13448 100 4930 4916 130 129 36461 100 2964 2969 78 78 LZMA:x5:mt1 3632 100 4538 4538 119 119 36500 100 3078 3079 81 81 LZMA:x5:mt2 4785 163 3675 5979 97 157 36536 100 3085 3082 81 81 Deflate:x1 38635 100 4903 4906 129 129 101874 100 3168 3165 83 83 Deflate:x5 11605 100 4458 4468 117 118 102011 100 3172 3167 83 83 Deflate:x7 4085 100 4527 4526 119 119 102711 100 3184 3187 84 84 Deflate64:x5 10844 100 4689 4686 123 123 102011 100 3196 3191 84 84 BZip2:x1 6207 100 3764 3751 99 99 27636 100 2991 2996 79 79 BZip2:x5 5697 100 4748 4755 125 125 25145 100 4947 4935 130 130 BZip2:x5:mt2 7673 196 3271 6404 86 169 29969 141 4176 5882 110 155 BZip2:x7 1783 100 4619 4620 122 122 25316 100 4962 4965 131 131 PPMD:x1 4530 100 4681 4685 123 123 3535 100 4167 4163 110 110 PPMD:x5 3526 100 5984 5976 157 157 2724 100 5100 5105 134 134 Delta:4 575429 100 3535 3535 93 93 563837 100 3470 3464 91 91 BCJ 1327653 100 5429 5438 143 143 1331109 100 5468 5452 144 143 AES256CBC:1 188021 100 4620 4621 122 122 207201 100 5084 5092 134 134 AES256CBC:2 CRC32:1 307853 100 2242 2241 59 59 CRC32:4 879880 100 1964 1964 52 52 CRC32:8 1194732 100 1620 1620 43 43 CRC64 805290 100 1650 1649 43 43 SHA256 206928 100 4222 4221 111 111 SHA1 429681 100 4021 4022 106 106 BLAKE2sp 280716 100 6178 6176 163 163 CPU 100 1891 1892 50 50 ------------------------------------------------------ Tot: 109 3840 4161 101 110 RAM usage: 901 MB, # Benchmark threads: 4 Method Speed Usage R/U Rating E/U Effec KiB/s % MIPS MIPS % % CPU 395 1727 6822 45 180 CPU 392 1715 6721 45 177 CPU 396 1736 6874 46 181 LZMA:x1 29939 399 2743 10945 72 288 88856 399 1812 7236 48 190 LZMA:x5:mt1 7027 398 2204 8779 58 231 86137 399 1820 7264 48 191 LZMA:x5:mt2 7276 396 2295 9091 60 239 85371 399 1804 7199 47 189 Deflate:x1 72464 399 2305 9201 61 242 213091 399 1658 6621 44 174 Deflate:x5 21824 398 2111 8403 56 221 213482 399 1660 6627 44 174 Deflate:x7 7258 400 2013 8042 53 212 214428 399 1669 6654 44 175 Deflate64:x5 20815 398 2260 8995 59 237 213066 400 1668 6665 44 175 BZip2:x1 11327 400 1711 6844 45 180 55482 400 1505 6015 40 158 BZip2:x5 9783 399 2048 8165 54 215 49215 398 2425 9660 64 254 BZip2:x5:mt2 9870 396 2082 8238 55 217 48835 384 2498 9585 66 252 BZip2:x7 3066 399 1991 7945 52 209 49382 397 2439 9684 64 255 PPMD:x1 7374 399 1911 7627 50 201 6165 399 1818 7260 48 191 PPMD:x5 5428 400 2303 9201 61 242 4636 399 2177 8689 57 229 Delta:4 708191 400 1089 4351 29 115 686198 399 1056 4216 28 111 BCJ 1672737 400 1714 6852 45 180 1753549 400 1797 7183 47 189 AES256CBC:1 117317 399 723 2883 19 76 123355 399 761 3032 20 80 AES256CBC:2 CRC32:1 625149 399 1142 4551 30 120 CRC32:4 1444809 400 807 3225 21 85 CRC32:8 1673850 399 570 2270 15 60 CRC64 1477231 400 757 3025 20 80 SHA256 261507 399 1337 5335 35 140 SHA1 666711 400 1561 6240 41 164 BLAKE2sp 426756 400 2349 9389 62 247 CPU 395 1756 6934 46 182 ------------------------------------------------------ Tot: 398 1941 7728 51 203 ### clang-7 -O3 ## THP off, thread-local storage hack (aes, blake2sp, crc, deflate, sha256) + MY_CPU_LE_UNALIGN # numactl -m0 -C28-31 RAM usage: 901 MB, # Benchmark threads: 4 Method Speed Usage R/U Rating E/U Effec KiB/s % MIPS MIPS % % CPU 396 1729 6855 46 180 CPU 393 1713 6723 45 177 CPU 396 1726 6829 45 180 LZMA:x1 29998 398 2754 10966 72 289 88450 399 1805 7203 47 190 LZMA:x5:mt1 6443 399 2019 8049 53 212 86521 399 1827 7296 48 192 LZMA:x5:mt2 7136 387 2306 8916 61 235 86474 399 1828 7292 48 192 Deflate:x1 72690 399 2315 9230 61 243 219825 399 1713 6830 45 180 Deflate:x5 22128 399 2134 8520 56 224 220260 399 1715 6838 45 180 Deflate:x7 7336 400 2035 8129 54 214 221478 399 1723 6873 45 181 Deflate64:x5 21106 399 2285 9121 60 240 219763 399 1723 6874 45 181 BZip2:x1 11345 400 1714 6854 45 180 55400 400 1503 6006 40 158 BZip2:x5 10036 396 2117 8376 56 220 49172 398 2424 9652 64 254 BZip2:x5:mt2 9768 400 2039 8152 54 215 48410 383 2480 9502 65 250 BZip2:x7 3057 400 1983 7921 52 208 49573 399 2439 9722 64 256 PPMD:x1 7351 399 1904 7603 50 200 6106 399 1801 7191 47 189 PPMD:x5 5408 399 2295 9166 60 241 4522 393 2154 8475 57 223 Delta:4 713635 400 1097 4385 29 115 702532 400 1080 4316 28 114 BCJ 1755966 400 1799 7192 47 189 1748352 399 1794 7161 47 188 AES256CBC:1 254067 397 1573 6244 41 164 271973 400 1673 6684 44 176 AES256CBC:2 CRC32:1 1136614 400 2071 8275 54 218 CRC32:4 2859582 400 1597 6383 42 168 CRC32:8 2981125 400 1011 4042 27 106 CRC64 2429159 399 1246 4975 33 131 SHA256 262536 400 1340 5356 35 141 SHA1 665986 400 1559 6234 41 164 BLAKE2sp 449079 400 2471 9880 65 260 CPU 395 1752 6916 46 182 ------------------------------------------------------ Tot: 397 1981 7862 52 207