You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
llama.cpp has already supported ternary quantization for LLMs, e.g., Bitnet b1.58. We have trained a Ternary diffusion transformer model TerDiT. Due to the limitations of our engineering abilities, I am wondering if llama.cpp can support the deployment of this model, this can help our research a lot.
Motivation
Ternary quantization has become popular and has demonstrated computational speedups and power reductions, as demonstrated in works like llama.cpp and bitnet.cpp. We trained the first ternary DiT network, DiT is a popular structure nowadays for text to image generation. We would like to know if we can be assisted in realizing the deployment of llama.cpp.
Possible Implementation
We have limited engineering abilities. The implementation of TerDiT is similar to LLaMA. We think the implementation of llama.cpp and bitnet.cpp can be helpful😊.
The text was updated successfully, but these errors were encountered:
llama.cpp does not support image generation models, but I would suggest taking a look at https://github.com/leejet/stable-diffusion.cpp which is built using the same ggml library and could use the same ternary tensor types that are available in llama.cpp.
Prerequisites
Feature Description
llama.cpp has already supported ternary quantization for LLMs, e.g., Bitnet b1.58. We have trained a Ternary diffusion transformer model TerDiT. Due to the limitations of our engineering abilities, I am wondering if llama.cpp can support the deployment of this model, this can help our research a lot.
Motivation
Ternary quantization has become popular and has demonstrated computational speedups and power reductions, as demonstrated in works like llama.cpp and bitnet.cpp. We trained the first ternary DiT network, DiT is a popular structure nowadays for text to image generation. We would like to know if we can be assisted in realizing the deployment of llama.cpp.
Possible Implementation
We have limited engineering abilities. The implementation of TerDiT is similar to LLaMA. We think the implementation of llama.cpp and bitnet.cpp can be helpful😊.
The text was updated successfully, but these errors were encountered: