https://cantrip.org/sortfast.html
Examples:
make all
make CC=g++-9 INDEXED
make CC=clang++-9 ANDANDOR
make CC=clang++-9 INDEXED_PESSIMAL_ON_GCC
make CC=g++-9 CHECK=CHECK BOG
Same input, same algorithm, varying performance:
/tr>Variant | Compiler | Haswell | SkylakeX | Zen2 -march=znver2 |
* |
---|---|---|---|---|---|
STD | G++ | 1 (10.2s) | 1 (7.3s) | 1 (9.0s) | |
Clang | 1 (10.0s) | 1 (7.3s) | 1 (8.5s) | ||
BOG | G++ | 1.1 (11.5s) | 1.1 (8.2s) | 1.1 (10.3s) | |
Clang | 1.1 (11.3s) | 1.1 (7.9s) | 1.1 (9.4s) | ||
&&| | G++ | .56 (5.7s) | .61 (4.5s) | .82 (7.4s) | |
Clang | .49 (4.9s) | .47 (3.4s) | .75 (6.4) | ! | |
IX | G++ | .79 (8.1s) | .89 (6.5s) | .94 (8.5s) | |
Clang | .66 (6.6s) | .63 (4.6s) | .87 (7.4s) | ||
IX-P | G++ | 1.2 (12.0s) | 1.2 (8.4s) | 2.7 (24s) | ! |
Clang | .64 (6.4s) | .60 (4.4s) | .77 (7.4s) | ||
ROT | G++ | .72 (7.3s) | .70 (5.1s) | .73 (6.6s) | |
Clang | .68 (6.8s) | .66 (4.8s) | 1.2 (9.9s) | ||
CMOV | G++ | 1.1 (11.3s) | 1.1 (8.0s) | .82 (7.4s) | |
Clang | .49 (4.9s) | .47 (3.4s) | .74 (6.3s) | ! | |
IX1 | G++ | 1.6 (15.9s) | 1.6 (11.8s) | .82 (7.4) | ! |
Clang | .61 (6.1s) | .62 (4.5s) | .73 (6.6s) | ||
IX2 | G++ | 1.1 (11.2s) | 1.1 (8.2s) | 2.1 (19s) | ! |
Clang | .69 (6.9s) | .70 (5.1s) | 2.2 (19s) | ! |
Note that compiler tunings for Zen2 are very immature, resulting in severely suboptimal code generation in several cases seen here.
Radix sorts, for reference
Variant | Compiler | Haswell | SkylakeX | Zen2 -march=znver2 |
* |
---|---|---|---|---|---|
RX256 | G++ | .13 (1.3s) | .14 (1.0s) | .20 (1.8s) | ! |
Clang | .12 (1.2s) | .12 (0.9s) | .21 (1.8s) | ! | |
RX2 | G++ | .66 (6.7s) | .62 (4.5s) | 1.4 (12.7s) | |
Clang | .71 (7.1s) | .66 (4.8s) | 1.4 (11.7s) |