-
-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About low accuracy on converted models #339
Comments
@marcoslucianops could you please share code to evaluate .engine model? |
I will share it in the future. |
Do I need to use file "libnvdsinfer_custom_impl_Yolo.so" generated from command "CUDA_VER=11.8 make -C nvdsinfer_custom_impl_Yolo" for evaluation or only use .engine model? |
My eval code is created based on deepstream_python_apps with some custom implementations (image batch input, pycocotools, etc). It uses DeepStream to generate the JSON to be evaluated by pycocotools. |
I inference for each image in COCO val, collect labels to generate json file. But I got low mAP for yolov7 fp32 .engine model: |
In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes? |
Yes. I run deepstream app for images and save output (labels) in a file by setting |
In the kitti output, the bboxes coordinates are related to the streammux resolution you set. You need to change them according to each validation image resolution. |
Yes, I recognized that, and also changed to image size, but mAP is too low. |
Did you set
In the config_infer_primary_yoloV7.txt file? |
Did you use the above config to receive benchmark? I used default set up.
|
The evaluation uses different NMS and confidence thresholds. Try with the values I sent. |
Thanks a lot for supporting me. I am going to try it now😍 |
I used this set up, mAP is better, but it is still lower than your benchmark for YOLOv7. Here is my result for fp32 .engine model
I attached my config ( I have many config as final_config_1.txt) |
My eval code is fine-adjusted for extract the better mAP using DeepStream, that's why I got a bit more mAP. |
@marcoslucianops Do You mean Yolov7 model? I saw that your fp16 .engine model has mAP0.5:0.95 = 0.476, it means that mAP0.5:0.95 (of fp32 .engine model) = 0.476. It is too low compared with reference .pt model mAP0.5:0.95 = 0.514 https://github.com/WongKinYiu/yolov7#performance |
There's a drop on TensorRT compared to the PyTorch model. In some models, it's a relevant drop. In other models (like PPYOLOE and YOLO-NAS), it's a small. The test I did I was comparing the ONNX export method with the |
Thanks a lot. I expect fp32 is not drop mAP much. If mAP of fp32 or fp16 drop much, so mAP of int8 is still lower. |
The FP16 and FP32 mAP are equal. |
Yeah, I think so. In your opinion, what is the reason of fp32, fp16's mAP big drop compared with .pt models? I mean some models included yolov7. I saw that yolov7 fp16 is dropped about 4%. |
In my opinion, TensorRT layers are performance focused, making some tweaks to precisions and parameters. So it's faster, but loses some of the accuracy. |
Thanks for sharing. |
Could this be related to inputs being different, not only TensorRT tweaks? For instance, in YOLOv8 it looks like symmetric padding is done with a grayscale value rather than with black color like DeepStream's Edit: I also saw the following warning when running with exported ONNX models. Could this be another reason for the drop in performance? Is it possible to export using INT32 instead of INT64?
In any case, it would be good to have a table of the expected drop for each of the models, as a reference. |
@cgrtrifork anything update? I have same warning that Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. How to slove the problem. |
when I inference yolov8s in Deepstream-6.3 in nvidia agx orin DK, I have some question. would someone give me some explain and guide ? |
I ran the following experiment: I am trying out YOLOv8 object detection on an image that contains an object.
Having used the same TensorRT model, this makes me think there is an issue either on the parsing and interpretation of the output from the model, or deeper in a lower level DeepStream preprocessing of the image.
For completeness:
# extract all the frames from the original video into a folder
# frames are enumerated starting from 1
ffmpeg -i original_video.mp4 original_video/%05d.jpg Then I chose the frame to use (number 84), and I created the single-frame video by doing: # frames start from 0, that's why we choose 84-1=83
ffmpeg -i original_video.mp4 -vf "select=eq(n\,83)" single_frame_video.mp4
@marcoslucianops have you tried evaluating the engine file outside of DeepStream? |
Following up on this I found out that the parsing from
The DeepStream version I'm using is 6.2, I will test this in newer versions too. EDIT: It seems to be fixed when upgrading to DeepStream 6.3, now all the detections are found if NMS is disabled, and only the correct maximum confidence detection is found when using NMS. |
I evaluated the mAP between get_wts model and ONNX model and both faced accuracy drop on TensorRT conversion. The conclusion is that the TensorRT drops the accuracy when optimizing the layers.
YOLOv8n ONNX:
YOLOv8n get_wts_yolov8.py
The text was updated successfully, but these errors were encountered: