-
Notifications
You must be signed in to change notification settings - Fork 990
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
trtllm-build with chatglm3-6b out of memory on RTX4070 12G
bug
Something isn't working
#2451
opened Nov 15, 2024 by
bushnerd
2 of 4 tasks
TensorRT-LLM for Whisper: AttributeError: 'PretrainedConfig' object has no attribute 'n_audio_ctx'
#2449
opened Nov 15, 2024 by
DeekshithaDPrakash
Unable to profile cpp benchmark due to NCCL error
bug
Something isn't working
#2448
opened Nov 15, 2024 by
YJHMITWEB
2 of 4 tasks
Build Qwen2-72B-Instruct model by INT4-AWQ quantization failed
bug
Something isn't working
#2445
opened Nov 14, 2024 by
wangpeilin
2 of 4 tasks
[feature request] Qserve + int8_kv_cache
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#2444
opened Nov 14, 2024 by
lkm2835
tensorrtllm backend fails when kv cache is disabled
bug
Something isn't working
triaged
Issue has been triaged by maintainers
Triton backend
#2443
opened Nov 13, 2024 by
ShuaiShao93
4 tasks
Assertion failed: noRepeatNgramSize.value() > 0
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2442
opened Nov 13, 2024 by
krishnanpooja
2 of 4 tasks
Regarding server performance with LoRA
performance issue
Issue about performance number
triaged
Issue has been triaged by maintainers
#2441
opened Nov 13, 2024 by
binhtranmcs
2 of 4 tasks
Inference RoBERTa on Triton server using TRT_LLM
triaged
Issue has been triaged by maintainers
Triton backend
#2440
opened Nov 13, 2024 by
DeekshithaDPrakash
[bug] unnecessary batch logits post processor calls
triaged
Issue has been triaged by maintainers
#2439
opened Nov 12, 2024 by
akhoroshev
FA V2 Nonusage during Decode/Generation Phase
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2438
opened Nov 12, 2024 by
usajid14
Error in data types: using model with lora
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2434
opened Nov 11, 2024 by
Alireza3242
2 of 4 tasks
integrating support for structured decoding library outlines
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#2432
opened Nov 11, 2024 by
kumar-devesh
trtllm-build ignores Something isn't working
triaged
Issue has been triaged by maintainers
--model_cls_file
and --model_cls_name
bug
#2430
opened Nov 9, 2024 by
abhishekudupa
2 of 4 tasks
trt_build for Llama 3.1 70B fp8 fails with CUDA error
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2429
opened Nov 8, 2024 by
chrisreese-if
2 of 4 tasks
trt_build for Llama 3.1 70B w4a8 fails with CUDA error
bug
Something isn't working
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2428
opened Nov 8, 2024 by
chrisreese-if
2 of 4 tasks
why Dit does not support pp_size > 1
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2427
opened Nov 8, 2024 by
algorithmconquer
[Question] Document/examples to enable draft model speculative decoding using c++ executor API
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2424
opened Nov 7, 2024 by
ynwang007
[Question] Can I build the tritonserver, tensorrtllm_backend and tensorrtllm and then use these build files across servers?
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2423
opened Nov 7, 2024 by
chrisreese-if
attempt to run benchmark with batch_size>=512 and input_output_len=1024,128 result in tensor volume exceeds 2147483647 error
triaged
Issue has been triaged by maintainers
waiting for feedback
#2422
opened Nov 7, 2024 by
dmonakhov
support FLUX?
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2421
opened Nov 7, 2024 by
algorithmconquer
Assertion failed: Must set crossKvCacheFraction for encoder-decoder model
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2419
opened Nov 6, 2024 by
Saeedmatt3r
2 of 4 tasks
CUDA runtime error in cudaMemcpyAsync when enabling kv cache reuse with prompt table and TP > 1.
bug
Something isn't working
Investigating
triaged
Issue has been triaged by maintainers
#2417
opened Nov 6, 2024 by
jxchenus
2 of 4 tasks
Previous Next
ProTip!
Follow long discussions with comments:>50.