Bug: Using llama_batch_init+add+free instead of llama_batch_get_one() permanently slows down llama_decode significantly #10322
Labels
bug-unconfirmed
high severity
Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
What happened?
I have the following code (roughly) executed at some point for prompt processing:
Afterwards, llama_decode for token generation becomes significantly slower (roughly 14t/s against 36t/s).
However if this code is replaced by llama_batch_get_one equivalent, performance remains high.
I'm not sure why this happens, maybe I use llama batch incorrectly.
Name and Version
~ 4083 (09ecbcb)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response
The text was updated successfully, but these errors were encountered: