Skip to content

Issues: NVIDIA/TensorRT-LLM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

trtllm-build with chatglm3-6b out of memory on RTX4070 12G bug Something isn't working
#2451 opened Nov 15, 2024 by bushnerd
2 of 4 tasks
Does recurrentgemma support quantization?
#2450 opened Nov 15, 2024 by daiwk
Unable to profile cpp benchmark due to NCCL error bug Something isn't working
#2448 opened Nov 15, 2024 by YJHMITWEB
2 of 4 tasks
Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed bug Something isn't working
#2445 opened Nov 14, 2024 by wangpeilin
2 of 4 tasks
[feature request] Qserve + int8_kv_cache feature request New feature or request triaged Issue has been triaged by maintainers
#2444 opened Nov 14, 2024 by lkm2835
tensorrtllm backend fails when kv cache is disabled bug Something isn't working triaged Issue has been triaged by maintainers Triton backend
#2443 opened Nov 13, 2024 by ShuaiShao93
4 tasks
Assertion failed: noRepeatNgramSize.value() > 0 bug Something isn't working triaged Issue has been triaged by maintainers
#2442 opened Nov 13, 2024 by krishnanpooja
2 of 4 tasks
Regarding server performance with LoRA performance issue Issue about performance number triaged Issue has been triaged by maintainers
#2441 opened Nov 13, 2024 by binhtranmcs
2 of 4 tasks
Inference RoBERTa on Triton server using TRT_LLM triaged Issue has been triaged by maintainers Triton backend
#2440 opened Nov 13, 2024 by DeekshithaDPrakash
[bug] unnecessary batch logits post processor calls triaged Issue has been triaged by maintainers
#2439 opened Nov 12, 2024 by akhoroshev
FA V2 Nonusage during Decode/Generation Phase question Further information is requested triaged Issue has been triaged by maintainers
#2438 opened Nov 12, 2024 by usajid14
Error in data types: using model with lora bug Something isn't working triaged Issue has been triaged by maintainers
#2434 opened Nov 11, 2024 by Alireza3242
2 of 4 tasks
integrating support for structured decoding library outlines feature request New feature or request triaged Issue has been triaged by maintainers
#2432 opened Nov 11, 2024 by kumar-devesh
trtllm-build ignores --model_cls_file and --model_cls_name bug Something isn't working triaged Issue has been triaged by maintainers
#2430 opened Nov 9, 2024 by abhishekudupa
2 of 4 tasks
trt_build for Llama 3.1 70B fp8 fails with CUDA error bug Something isn't working triaged Issue has been triaged by maintainers
#2429 opened Nov 8, 2024 by chrisreese-if
2 of 4 tasks
trt_build for Llama 3.1 70B w4a8 fails with CUDA error bug Something isn't working quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2428 opened Nov 8, 2024 by chrisreese-if
2 of 4 tasks
why Dit does not support pp_size > 1 question Further information is requested triaged Issue has been triaged by maintainers
#2427 opened Nov 8, 2024 by algorithmconquer
[Question] Document/examples to enable draft model speculative decoding using c++ executor API question Further information is requested triaged Issue has been triaged by maintainers
#2424 opened Nov 7, 2024 by ynwang007
support FLUX? question Further information is requested triaged Issue has been triaged by maintainers
#2421 opened Nov 7, 2024 by algorithmconquer
qwen 2-1.5B model build error bug Something isn't working duplicate This issue or pull request already exists triaged Issue has been triaged by maintainers
#2420 opened Nov 6, 2024 by rexmxw02
4 tasks
Assertion failed: Must set crossKvCacheFraction for encoder-decoder model bug Something isn't working triaged Issue has been triaged by maintainers
#2419 opened Nov 6, 2024 by Saeedmatt3r
2 of 4 tasks
CUDA runtime error in cudaMemcpyAsync when enabling kv cache reuse with prompt table and TP > 1. bug Something isn't working Investigating triaged Issue has been triaged by maintainers
#2417 opened Nov 6, 2024 by jxchenus
2 of 4 tasks
ProTip! Follow long discussions with comments:>50.