We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The current DequantizeLinear CPU operator does not use threads.
I have implemented a quick prototype that shows a 4x speed up on that operator when used with a Qwen 2.5 0.5B model
I do see a comment about this:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/quantization/quantize_linear.cc#L302
@fajin-corp is this something you were planning to implement? I'd be happy to help under your guidance
n/a
No response
Windows
any
Built from Source
main
Python
X64
Default CPU
Yes
The text was updated successfully, but these errors were encountered:
Go ahead and PR it.
Sorry, something went wrong.
@tarekziade I'm not working on it. You are very welcome to open a PR for it.
No branches or pull requests
Describe the issue
The current DequantizeLinear CPU operator does not use threads.
I have implemented a quick prototype that shows a 4x speed up on that operator when used with a Qwen 2.5 0.5B model
I do see a comment about this:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/quantization/quantize_linear.cc#L302
@fajin-corp is this something you were planning to implement? I'd be happy to help under your guidance
To reproduce
n/a
Urgency
No response
Platform
Windows
OS Version
any
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
main
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: