-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Max operator became 4.5X slower after Fixing NaN propagation for float16 min and max operators. #23337
Labels
Comments
Bug Analysis ReportSummaryPerformance regression was identified when comparing two versions: cfa45df and ce13f65. Profiling revealed significant differences in execution times for key operations, indicating potential inefficiencies introduced in the newer version. Key Findings
How to Reproduce
import time
import onnxruntime
import numpy as np
# Set the random seed
np.random.seed(0)
onnx_model_path = 'model.onnx'
# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()
nth = 100000
# Warm-up inference to cache optimizations
input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)
# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):
start_ns = time.perf_counter_ns()
ort_session.run(None, input_data)
end_ns = time.perf_counter_ns()
total_time_ns += end_ns - start_ns
avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6
print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms') Profile Log
|
SuhwanSong
changed the title
[Performance] Fixing NaN propagation for float16 min and max operators introduces 50% slowdown.
[Performance] Max operator became 4.5X slower after Fixing NaN propagation for float16 min and max operators.
Jan 14, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Describe the issue
ONNXRuntime introduces a 50% slowdown in version 1.20.1 compared to version 1.17.0.
The performance regression originates from commit ce13f65.
Bisected Commit Range:
cfa45df6b5060af6327a98a625eb9fe74580f56c..ce13f651d86952335a126f04e741d68bc41323fa
Model
Environment
To reproduce
poc.onnx.zip
Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: