ISA | CPU | Threads | Frequency (MHz) |
Compressing (MIPS) |
Decompressing (MIPS) |
---|---|---|---|---|---|
ARM | |||||
Apple M1 8 cores 7-zip 21.03 clang-12 -O2 |
1 | 3200 | 7091 | 8068 | |
2 | 3100 | 20728 | 15293 | ||
4 | 3000 | 36372 | 29051 | ||
8 | 4x 3000 4x 2064 | 50365 | 45009 | ||
Apple A12Z 8 cores p7zip 16.02 |
1 | 2500 | 4410 | 3640 | |
2 | 2380 | 10530 | 6860 | ||
4 | 2325 | 19930 | 13270 | ||
8 | 4x 2325 4x 1538 | 29510 | 20310 | ||
Apple A9 (Twister)
2 cores 1850 MHz (1 core) 1800 MHz (2 cores) ? clang-4.0 -O3 |
1 | arm32-thumb2 | 1940 | 1900 | |
arm32 | 2100 | 1960 | |||
arm64-lsl | 2500 | 2510 | |||
1 | arm64 | 2480 | 2430 | ||
2 | 5760 | 4690 | |||
4 | 6380 | 4570 | |||
Hisilicon SD5113 (ARM11) | 530 | 113 | 283 | ||
Marvell XScale PXA270 | 520 | 85 | 270 | ||
Marvell Kirkwood 88F6281 (SheevaPlug) | 1200 | 385 | 710 | ||
Qualcomm QSD8250 (Snapdragon) | 1000 | 430 | 700 | ||
Qualcomm Krait 300 MSM8230AB Snapdragon 400 2 cores |
1 | 1728 | 850 | 1480 | |
2 | 1350 | 2900 | |||
4 | 1620 | 2900 | |||
Qualcomm Krait 400 MSM8974 Snapdragon 800 4 cores |
1 | 2260 | 1120 | 2060 | |
2 | 1800 | 3900 | |||
4 | 3100 | 7870 | |||
8 | 3500 | 8030 | |||
Qualcomm Snapdragon 835 8 cores aarch64 |
1 | 2450 | 1935 | 2470 | |
8 | 9500 | 13800 | |||
Freescale i.MX515 (Cortex-A8) | 800 | 325 | 645 | ||
TI DM3730 (Cortex-A8) | 1000 | 480 | 900 | ||
Samsung Hummingbird (Cortex-A8) | 1000 | 560 | 930 | ||
Allwinner A20 Cortex-A7 2 cores |
1 | 1000 | 470 | 810 | |
2 | 750 | 1560 | |||
4 | 880 | 1560 | |||
Samsung Exynos 4210 (Cortex-A9) 2 cores |
1 | 1200 | 790 | 1080 | |
2 | 1180 | 2140 | |||
4 | 1380 | 2130 | |||
Samsung Exynos 4412 (Cortex-A9) 4 cores Android |
1 (v5) | 1400 | 660 | 1180 | |
1 (v5 Thumb) | 530 | 745 | |||
1 (v7 Thumb-2) | 710 | 1010 | |||
1 | 740 | 1210 | |||
2 | 1200 | 2400 | |||
4 | 1700 | 4700 | |||
Samsung Exynos 4412 (Cortex-A9) 4 cores, Linux |
1 | 1600 | 900 | 1350 | |
4 | 2460 | 5200 | |||
Samsung Exynos 5250 (Cortex-A15) 2 cores | 1 | 1700 | 1350 | 1830 | |
2 | 2270 | 3560 | |||
4 | 2450 | 3540 | |||
Rockchip RK3288 (Cortex-A17) 4 cores 7-Zip 21.06 arm32 | 1 | 1800 | 1889 | 2026 | |
2 | 2907 | 3981 | |||
4 | 4921 | 7777 | |||
NVIDIA Tegra K1 (Cortex-A15) 4 cores | 1 | 2200 | 1680 | 2320 | |
2 | 2880 | 4600 | |||
4 | 5130 | 9100 | |||
8 | 5360 | 9100 | |||
NVIDIA Tegra K1 (Denver) 2 cores, 32-bit | 1 | 2500 | 2220 | 2940 | |
Amlogic S905
(Cortex-A53) 4 cores |
1 | 1536 32-bit arm32 |
880 | 1600 | |
2 | 1430 | 3150 | |||
4 | 2560 | 5940 | |||
6 | 2820 | 5940 | |||
1 | 1536 32-bit Thumb2 |
890 | 1370 | ||
2 | 1450 | 2700 | |||
4 | 2630 | 5130 | |||
6 | 2820 | 5150 | |||
1 | 1536 64-bit |
860 | 1420 | ||
2 | 1450 | 2800 | |||
4 | 2600 | 5250 | |||
6 | 2850 | 5300 | |||
Snapdragon 855
Cortex-A55 4 cores arm64, gcc-6 -O2, twrp |
1 | 1780 | 1120 | 1830 | |
2 | 2310 | 3490 | |||
4 | 4500 | 7080 | |||
Snapdragon 855
Cortex-A76 4 cores gcc-6 -O2 arm64 twrp |
1, Thumb2 | 2840 | 2640 | 3600 | |
1, arm32 | 2690 | 4000 | |||
1 | 1x A76 2840 3x A76 2420 |
2830 | 3360 | ||
2 | 6680 | 5800 | |||
4 | 12440 | 11720 | |||
Snapdragon 855
8 cores gcc-6 -O2, arm64, twrp |
8 | 1x A76 2840 3x A76 2420 4x A53 1780 |
15550 | 18970 | |
AMD Opteron A1170 (Cortex-A57)
8 cores 32-bit (ARMv7-A) 64-bit (aarch64) |
1 | 2000 32-bit |
2030 | 2650 | |
8 | 12700 | 19700 | |||
1 | 2000 64-bit |
2160 | 2040 | ||
8 | 13200 | 15800 | |||
APM X-Gene1
8 cores 32-bit (ARMv7-A) 64-bit (aarch64) |
1 | 2400 32-bit |
1620 | 2270 | |
8 | 9600 | 17000 | |||
1 | 2400 64-bit |
1770 | 1980 | ||
8 | 10500 | 14900 | |||
1 (2 MB pages) | 2080 | 1980 | |||
8 (2 MB pages) | 11400 | 14900 | |||
16 (2 MB pages) | 13300 | 14900 | |||
Cavium ThunderX
12 virtual cores 64-bit (aarch64) |
1 | 2000 | 1230 | 1970 | |
12 | 13700 | 22200 | |||
RISC-V | SiFive FU740 ( U74 ) 4 cores THP (2 MB pages) |
1 | 1200 | 844 | 1108 |
2 | 1498 | 2169 | |||
4 | 2526 | 4070 | |||
MIPS | |||||
Cavium Octeon II 2 cores THP (1 MB pages), n32 ABI |
1 | 1000 | 750 | 880 | |
2 | 1070 | 1700 | |||
4 | 1330 | 1700 | |||
TI AR7 (MIPS 4K) | 150 | 53 | 107 | ||
Broadcom BCM6338 (MIPS32) | 240 | 64 | 110 | ||
Atheros QCA9533 (MIPS 24Kc) | 650 | 229 | 447 | ||
Broadcom BCM4718 (MIPS32 74K) | 480 | 200 | 300 | ||
Atheros AR9344 (MIPS32 74K) | (4 KB pages) | 560 | 220 | 360 | |
(16 KB pages) | 253 | 363 | |||
Ingenic JZ4780 2 cores |
1 | 1200 | 360 | 690 | |
2 | 535 | 1300 | |||
4 | 657 | 1300 | |||
ICT Loongson 2F | 800 | 440 | 570 | ||
ICT Loongson 3A 4 cores |
1 (32MB pages) | 900 | 645 | 650 | |
1 | 476 | 650 | |||
2 | 900 | 1260 | |||
4 | 1400 | 2400 | |||
6 | 1440 | 2400 | |||
Broadcom BCM7356 (BRCM 5000 / Zephyr)
1 core, 2 threads |
1 | 1300 | 454 | 808 | |
2 | 688 | 1230 | |||
Baikal-T1 (MIPS P5600)
2 cores |
1 | 1200 | 870 | 990 | |
2 | 1460 | 1920 | |||
LoongArch | Loongson 3A5000 4 cores v21.06 |
1 | 2300 | 3764 | 2639 |
2 | 7408 | 5259 | |||
4 | 13783 | 10380 | |||
PowerPC | IBM Cell PPE 1 core, 2 threads | 1 | 3200 | 720 | 1060 |
2 | 900 | 1500 | |||
4 | 1000 | 1500 | |||
IBM PowerPC 970FX (G5) | 1800 | 750 | 1330 | ||
IBM PowerPC 970MP (G5) 4 cores |
1 | 2500 | 1230 | 2050 | |
2 | 2500 | 4000 | |||
4 | 4400 | 8000 | |||
IBM POWER7 8 cores 4 threads per core 32 threads per CPU |
1 (T0) | 3550 | 2700 | 3350 | |
1 (T3) | 2200 | 2870 | |||
2 (T0/T1) | 4100 | 5030 | |||
2 (T0/T3) | 3770 | 5000 | |||
2 (T2/T3) | 3250 | 3900 | |||
4 (1 core) | 4800 | 7100 | |||
6 (1 core) | 5200 | 7100 | |||
32 (8 cores) | 35000 | 56000 | |||
40 (8 cores) | 37000 | 56000 | |||
IBM POWER8 2 chips * 5 cores * 8 threads 10 cores 80 threads |
1 (1 core) | 3690 | 3200 | 3100 | |
2 (1 core) | 4400 | 5000 | |||
4 (1 core) | 5900 | 6900 | |||
8 (1 core) | 6900 | 7900 | |||
10 (1 core) | 7300 | 7900 | |||
10 (5 cores) | 15000 | 19000 | |||
20 (5 cores) | 25000 | 32500 | |||
40 (5 cores) | 30000 | 39000 | |||
80 (10 cores) | 57000 | 74000 | |||
IBM POWER9 2 chips * 16 cores * 4 threads 32 cores 128 threads |
1 | 3800 1 core |
4090 | 3140 | |
2 | 5540 | 4560 | |||
4 | 7260 | 7070 | |||
8 | 8370 | 7270 | |||
16 | 3300 1 socket |
42300 | 41500 | ||
32 | 67000 | 61400 | |||
64 | 3200 1 socket |
87900 | 92200 | ||
128 | 93600 | 90400 | |||
128 | 3200 2 sockets |
159000 | 177000 | ||
SPARC | Sun UltraSPARC II 6 cores | 1 | 400 | 280 | 290 |
6 | 1130 | 1700 | |||
10 | 1400 | 1700 | |||
Sun UltraSPARC IIe | 520 | 300 | 365 | ||
Sun UltraSPARC IIIi | 1000 | 600 | 780 | ||
Sun UltraSPARC T1 8 cores, 32 threads | 1 | 1000 | 344 | 426 | |
8 | 1740 | 2600 | |||
32 | 3000 | 6100 | |||
64 | 4000 | 6000 | |||
Fujitsu SPARC64_VII
4 cores 8 threads -m32 -O3 |
1 | 2520 -m64 | 1460 | 1940 | |
1 | 2520 -m32 | 1580 | 2190 | ||
2 | 2180 | 2880 | |||
8 | 6430 | 11070 | |||
Oracle SPARC T5 1 core 8 threads |
1 | 3600 | 2240 | 2100 | |
2 | 3600 | 3230 | |||
4 | 4320 | 4570 | |||
8 | 4600 | 5460 | |||
MCST R1000 4 CPUs, 16 cores 64-bit |
1 | 1000 | 577 | 793 | |
4 | 2130 | 3059 | |||
8 | 3527 | 5915 | |||
16 | 5269 | 11332 | |||
MCST R2000 4 CPUs, 32 cores 64-bit |
1 | 2000 | 826 | 1528 | |
4 | 2962 | 5872 | |||
8 | 4933 | 11752 | |||
16 | 8392 | 22353 | |||
32 | 12441 | 41212 | |||
Elbrus | MCST Elbrus-2C+ 2 cores, 32-bit |
1 | 500 | 675 | 644 |
2 | 937 | 1262 | |||
MCST Elbrus-4C Elbrus 401-PC 4 cores 32-bit |
1 | 800 | 1024 | 1038 | |
2 | 1593 | 2069 | |||
4 | 3130 | 3954 | |||
MCST Elbrus-4C Elbrus 404 4 sockets, 16 cores 64-bit |
1 | 800 | 1085 | 1045 | |
4 | 2936 | 4025 | |||
8 | 5891 | 7841 | |||
16 | 11048 | 14643 | |||
MCST Elbrus-1C+ Elbrus 101-PC 1 core, 32-bit |
1 | 1000 | 1301 | 1254 | |
MCST Elbrus-8C Elbrus 801-PC 8 cores |
1 | 1300 e2k, 32-bit |
1732 | 1689 | |
4 | 5008 | 6587 | |||
8 | 9625 | 12857 | |||
1 | 1300 x86-64 RTC GCC |
1673 | 1680 | ||
4 | 5073 | 6468 | |||
8 | 9400 | 12424 | |||
MCST Elbrus-8C Elbrus 804 4 sockets, 32 cores e2k 64-bit |
1 | 1200 | 1538 | 1536 | |
4 | 4313 | 5868 | |||
8 | 8483 | 11366 | |||
16 | 16202 | 21952 | |||
32 | 28449 | 39894 | |||
MCST Elbrus-8SV Elbrus 901 8 cores |
1 | 1500 e2k, 32-bit |
1922 | 1873 | |
4 | 5632 | 7293 | |||
8 | 10768 | 14297 | |||
1 | 1500 e2k, 64-bit |
1845 | 1791 | ||
4 | 5356 | 7013 | |||
8 | 10306 | 13754 | |||
1 | 1500 x86-64 RTC GCC |
1886 | 1813 | ||
4 | 5641 | 7283 | |||
8 | 10829 | 14370 | |||
MCST Elbrus-2S3 2 cores 2.0 GHz |
1 | 7-Zip 16.02 32-bit |
2349 | 2429 | |
2 | 3681 | 4842 | |||
1 | 7-Zip 21.07 64-bit |
2344 | 2560 | ||
2 | 3877 | 5106 | |||
MCST Elbrus-16S
2 sockets * 16 cores 32 cores e2k, 64-bit 7-Zip 16.02 |
1 | 2000 | 2301 | 2391 | |
8 | 12377 | 17218 | |||
16 | 23373 | 32186 | |||
32 | 41145 | 65514 | |||
MCST Elbrus-16S
16 cores 2.0 GHz 7-Zip 21.07 |
1 | e2k, 64-bit | 2553 | 2520 | |
8 | 14741 | 20127 | |||
16 | 28176 | 39802 | |||
1 | x86-64 RTC (no ASM) |
2873 | 2247 | ||
8 | 16577 | 17498 | |||
16 | 31544 | 34291 | |||
PA-RISC | HP PA-8600 2 CPUs | 1 | 552 | 400 | 327 |
2 | 780 | 645 | |||
IA-64 | Intel Itanium 2 2 cores | 1 | 1300 | 1210 | 1220 |
2 | 1500 | 2430 | |||
4 | 2230 | 2400 | |||
x86 | VIA C7 | 1500 | 470 | 730 | |
VIA L4700E (VIA Nano) 4 cores | 1 | 1200 | 1060 | 1010 | |
4 | 3400 | 3900 | |||
AMD Am386DX | 40 | 6 | 6 | ||
AMD Am486 | 80 | 19 | 19 | ||
Cyrix 486 dx2 | 66 | 13 | 23 | ||
AMD K5 | 75 | 69 | 81 | ||
AMD Geode LX800 | 500 | 280 | 270 | ||
AMD K6-2 | 500 | 260 | 440 | ||
AMD E-350 (Bobcat) | 1 | 1600 | 1120 | 1480 | |
2 | 2080 | 2900 | |||
AMD AMD A8-6410 (Jaguar) (4 cores) @ 1800 MHz |
1 | 1800 | 1450 | 1650 | |
2 | 2600 | 3300 | |||
4 | 4400 | 6530 | |||
AMD Athlon 64 X2 (K8) | 1 | 2000 | 1800 | 2080 | |
2 | 3400 | 4170 | |||
AMD Phenom II X4 965 (K10) | 4 | 3400 | 11900 | 13500 | |
AMD Phenom II X6 1100T (K10) | 6 | 3300 | 16200 | 19600 | |
AMD FX-8350 (Piledriver) | 8 | 4000 | 22800 | 24900 | |
AMD FX-8300 (Piledriver) 4 modules, 8 threads 7-Zip 20.00 |
1 | 4200 | 4780 | 5390 | |
4 | 3900 | 17500 | 19000 | ||
8 | 3800 | 27000 | 34300 | ||
Ryzen 1400
4 cores, 8 threads Linux (THP off) gcc-6 -m64 -O3 |
1 | 3450 | 3750 | 3420 | |
8 | 3200 | 18700 | 20400 | ||
Ryzen 1700X
8 cores, 16 threads Linux (THP off) gcc-6 -m64 -O3 |
1 | 3900 | 4210 | 3900 | |
2 (1 core) | 6400 | 6300 | |||
8 (one CCX) | 3500 | 20200 | 21600 | ||
8 | 30700 | 27000 | |||
16 | 35200 | 43700 | |||
Ryzen 3950X (Zen2)
16 cores, 32 threads 7-Zip 20.00 |
1 | 4400- | 5930 | 8390 | |
16 | 76900 | 113700 | |||
32 | 84600 | 182400 | |||
Ryzen 5600G (Zen3)
6 cores, 12 threads 7-Zip 21.07 (Linux) |
1 | 4450 | 6775 | 9393 | |
6 | 40913 | 48622 | |||
12 | 55507 | 83299 | |||
Intel Pentium | 100 | 64 | 62 | ||
Intel Pentium MMX | 200 | 130 | 120 | ||
Intel Atom N270 (1 core, 2 threads) | 1 | 1600 | 700 | 900 | |
2 | 1000 | 1500 | |||
Intel Atom N2800 (2 cores, 4 threads) |
1 | 1862 | 870 | 1160 | |
2 | 1640 | 2260 | |||
4 | 2540 | 3530 | |||
8 | 2700 | 3500 | |||
Intel Atom Z2760 (2 cores) (Cloverview) | 4 | 1800 | 2100 | 3400 | |
Intel Celeron N2840 Silvermont 2 cores |
1 | 2580- | 1620 | 2070 | |
2 | 2420 | 4050 | |||
4 | 3080 | 4080 | |||
Intel Atom Z3740 (4 cores) (Silvermont) | 4 | 1860- | 3900 | 5900 | |
Intel Atom Z3770 (4 cores) (Silvermont) | 4 | 2400- | 4500 | 7000 | |
Intel Pentium N4200 (Goldmont) 4 cores |
1 | 2500- | 1600 | 2200 | |
Intel Pentium 4 (180 nm) | 1700 | 760 | 760 | ||
Intel Pentium 4 (130 nm) 1 core, 2 threads |
1 | 2400 | 1220 | 1080 | |
2 | 1500 | 1780 | |||
Intel Pentium 4 (65 nm) 1 core, 2 threads |
1 | 3000 | 1500 | 1530 | |
2 | 2000 | 2330 | |||
Intel Pentium II 2 CPUs |
1 | 350 | 290 | 300 | |
2 | 410 | 600 | |||
4 | 520 | 590 | |||
Intel Celeron (P6) | 1200 | 760 | 980 | ||
Intel Pentium III-S 2 CPUs |
1 | 1400 | 980 | 1250 | |
2 | 1600 | 2380 | |||
Intel Core 2 (1 core) | 2000 | 2000 | 2000 | ||
Intel Core 2 Quad Q9550 | 4 | 2833 | 9340 | 11100 | |
Intel i5-650 (Westmere) 2 cores, 4 threads Turbo Boost disabled |
1 | 3200 | 3150 | 3180 | |
2 | 6150 | 6200 | |||
4 | 8200 | 9460 | |||
Intel Xeon x5650 (Westmere) 1 CPU, 6 cores, 12 threads Turbo Boost disabled |
1 | 2670 | 3100 | 2600 | |
2 (1 core) | 4360 | 3800 | |||
12 (1 cpu) | 16200 | 22700 | |||
Intel i7 920 (4 cores) | 8 | 2666 | 15700 | 16800 | |
Intel i7 875K (4 cores) | 8 | 2933+ | 19000 | 19700 | |
Intel i7 980X (6 cores) | 12 | 3333 | 29000 | 30800 | |
Intel i3-2120 (Sandy Bridge) 2 cores, 4 threads |
1 | 3300 | 3800 | 3450 | |
2 | 7200 | 6800 | |||
4 | 9000 | 9400 | |||
6 | 10100 | 9300 | |||
Intel i7 2600K (4 cores) | 8 | 3400+ | 20100 | 20700 | |
Intel i7 3960X (6 cores) | 12 | 3300+ | 31900 | 31500 | |
Intel i7 3770 (Ivy Bridge)
4 cores, 8 threads Turbo Boost disabled Single RAM channel Linux (THP off) GCC-4.6.3 -O3 |
1 | 3400 | 4200 | 3760 | |
2 (1 core) | 6300 | 5000 | |||
2 | 8500 | 7400 | |||
4 | 15100 | 14700 | |||
8 | 21500 | 20300 | |||
Intel i7 3770K (4 cores) | 8 | 3500+ | 23700 | 22300 | |
Intel i7 4770 (Haswell)
4 cores, 8 threads Turbo Boost disabled |
1 | 3400 | 4000 | 4000 | |
8 | 20500 | 21000 | |||
Intel Core i7-5960X (8 cores, 16 threads) | 16 | 3000+ | 39600 | 40900 | |
Intel E5-2697 v2 (2 sockets, 24 cores) | 48 | 2700+ | 85000 | 102000 | |
Intel E5-2699 v3 (2 sockets, 36 cores) | 72 | 2300+ | 124000 | 141000 | |
Intel E7-8890 v3 (4 sockets, 72 cores) | 144 | 2500+ | 228000 | 255000 | |
Intel i7-6900K (Broadwell) ver 9.22 |
1 | 4000 | 4900 | 4800 | |
Intel Xeon E5-2699 v4 (Broadwell)
2 cpus, 44 cores, 88 threads THP on ver. 16.02 gcc-6 -O3 |
1 | 3600 | 5100 | 3900 | |
2 (1 core) | 3600 | 7350 | 5170 | ||
44 (1 cpu) | 2800 | 107000 | 87000 | ||
88 | 2800 | 186000 | 174000 | ||
Intel i7-7700K (Kaby Lake) ver 9.22 |
1 | 4000 | 4900 | 4700 | |
Intel i7-6700 (Skylake)
4 cores, 8 threads ver 9.22 |
1 | 4000 | 4640 | 4640 | |
8 | 3400-4000 | 24700 | 22900 | ||
Intel i7-7820X (Skylake X)
8 cores, 16 threads |
1 (9.22) | 4300 | 4950 | 5080 | |
1 (17.01) | 4300 | 5300 | 4600 | ||
2 (1 core) (17.01) | 4300 | 8270 | 6150 | ||
16 (17.01) | 4000 | 49900 | 46700 | ||
Intel i7-1065G7 (Ice Lake)
4 cores, 8 threads ver 9.22 |
1 | 3900- | 3800 | 4020 | |
8 | 3500- | 23300 | 21200 | ||
Intel i7-1065G7 (Ice Lake)
4 cores, 8 threads ver 19.02 |
1 | 3900- | 5000 | 7000 |
The LZMA benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off, and measured with old version of 7-Zip.
The test data that is used for compression in that test is produced with special algorithm, that creates data stream that has some properties of real data, like text or execution code. Note that the speed of LZMA for real data can be slightly different. The data in benchmark workload is too artificial and is more random than real world data.
Compression speed strongly depends from memory (RAM) latency, Data Cache size/speed and TLB. Also it uses simple 32-bit integer instructions: "shift", "add", "multiply" and other. Out-of-Order execution feature of CPU is also important for that test.
Decompression speed strongly depends on CPU integer operations. The most important things for that test are: branch misprediction penalty (the length of pipeline) and the latencies of 32-bit instructions ("multiply", "shift", "add" and other).
The decompression test has very high number of unpredictable branches. Note that some CPU architectures (for example, 32-bit ARM) support instructions that can be conditionally executed. So such CPUs can work without branches (and without pipeline flushing) in many cases in LZMA decompression code. And such CPUs can have some speed advantages over other architectures that don't support complex conditional execution.
Note: latest version of 7-Zip for x64 and arm64 contains optimized LZMA decoder that uses conditional move instructions instead of some of the unpredictable branches. That optimized code increases the speed of LZMA decompression in benchmark for up to 1.7 times.
Out-of-Order execution capability is not so important for LZMA Decompression.
The benchmark code doesn't use FPU and SSE. Most of the code is 32-bit integer code. Only some minor part in compression code uses also 64-bit integers.
The latencies of RAM/Cache are very important for compression speed.
The RAM bandwidth is not so important in single-thread compression/decompression, or if there is small number of working threads. But the RAM bandwidth can be main limiting factor for LZMA compression speed, if a big number of working threads are used.
The CPU's IPC (Instructions per cycle) rate is not very high for benchmark workloads. The estimated value of benchmark IPC is 1-2 (instructions per cycle) for modern CPU. The compression test has big number of random accesses to RAM and Data Cache. So big part of execution time the CPU waits the data from Data Cache or from RAM. The decompression test has big number of pipeline flushes after mispredicted branches and waiting for long dependency chains of instructions like 32-bit multiply. Such low IPC means that there are some unloaded CPU resources. And the CPU with SMT (Hyper-Threading) feature can load these free CPU resources using two threads. So SMT (Hyper-Threading) provides pretty big improvement in these tests.
When you specify (N*2) threads for test, the program creates N copies of LZMA encoder, and each LZMA encoder instance compresses separated block of test data. Each LZMA encoder instance creates 3 unsymmetrical execution threads: two big threads and one small thread. The total CPU load for these 3 threads can vary from 140% to 200%. To provide better CPU load during compression, we also can test the mode, where the number of benchmark threads is larger than the number of hardware threads.
Each LZMA encoder instance in multithreading mode divides the task of compression into 3 different tasks, where each task is executed in separated thread. Each of these tasks is simpler than original task, and it uses less memory. So each thread uses the data cache and TLB more effectively in multithreading mode. And LZMA encoder is slightly more effective in multithreading mode in value of "the Speed" divided to "CPU usage".
Note that there is some data traffic between 3 threads of LZMA encoder. So data exchange bandwidth via memory between CPU threads is also can be important, especially in multi-core system with big number of cores or CPUs.
All LZMA decoder threads are symmetrical and independent. So the decompression test uses all hardware threads, if the number of hardware threads is used.
We use benchmark results for 32 MB dictionary ("25:" line in results of console version). If 32 MB dictionary results are not available, we use the results for smaller dictionary. Most x86 tests were performed on Windows with official 7-Zip binaries. Some tests were performed in 64-bit mode. Most of the tests for other platforms were performed with p7zip compiled by GCC with speed optimization.
Note: new versions of 7-Zip provide improved performance. For example, latest versions of 7-Zip for x64 and arm64 platforms use optimized code for decompression written in assembler, so the rating results can be 1.7 times larger that with previous version of 7-Zip. But most results in the table represent measures performed with old version of 7-Zip before these optimizations. If some CPU was tested with new version of 7-Zip, there is a mark about version number of 7-Zip. If there is no version mark about version number, it was tested with version that provides the performance similar to version 7-Zip (p7zip) 16.02.
You can download binaries and source code of 7-Zip benchmark here:
7-Benchmark (Memlat and Pipelen)
If you have new interesting results, write about them on 7-max forum: