ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

eddnjjn · 2024-11-15T21:56:50Z

This pull request optimizes the code for repacking Q4_0 into Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8.

slaren

Very nice, for me this cuts the load time by 2/3 on x86, even more on M3 Max.

ggml: Optimize Q4_0 into Q4_0_X_Y repack

8007cb0

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 15, 2024

slaren approved these changes Nov 16, 2024

View reviewed changes

slaren merged commit 1e58ee1 into ggerganov:master Nov 16, 2024
54 checks passed

Provide feedback