RV1106: RKNN Model Inference Coordinate Issues
RV1106 Inference Problems: Model Conversion & Coordinate Discrepancies
Hey guys, let's dive into a tricky situation I've been wrestling with: getting a model to behave consistently on an RV1106 device after it's been converted using the RKNN toolkit. Specifically, the issue revolves around the output of bounding box coordinates during inference. My model works perfectly fine in the simulator, but the results are off on the actual device. Let's break down the problem, the steps I've taken, and where I'm scratching my head.
The Setup: Model, Conversion, and Expectations
I'm working with a model that spits out a single output tensor. This tensor is supposed to contain four coordinates representing the bounding box (all values between 0 and 1), plus a confidence score indicating how sure the model is about the detection. I used the rknn-toolkit's simulator, and the inference results were spot-on. The bounding boxes were drawn correctly over the output images, indicating everything was working as expected. This initially gave me a confidence boost!
The Conversion Process
After a successful conversion in the simulator, the next logical step was to port the model to the RV1106 device. I converted the model, targeting the RV1106. Then, the output tensor was processed using the deqnt_affine_to_f32
function for dequantization.
The Problem: Coordinate Mismatch
Here's the rub: after dequantization on the device, the bounding box coordinates aren't in the expected 0-1 range. The confidence score is correct, which suggests the dequantization of that part is fine. But the coordinates are way off, leading to incorrect bounding box placements.
For instance, after dequantization, I might see something like: [-0.309309, 1.701199, 0.141767, 2.062060]
. Those values aren't within the 0-1 range that defines a normalized bounding box coordinate, therefore, it does not align with the ground truth.
Conversion Code Snippet
Here's the Python code I used to convert the model. This code produces the correct results when I run it for inference using the simulator. This makes the debugging process more difficult.
import os
import cv2
import numpy as np
from rknn.api import RKNN
# Define constants
PLATFORM = 'rv1106' # Replace with your target platform
ONNX_MODEL = 'your_model.onnx' # Replace with your ONNX model file
RKNN_MODEL = 'your_model.rknn' # Replace with your RKNN model file
QUANTIZE_ON = True
DATASET = './dataset.txt'
IMG_SIZE = 224
CLASSES = ['cat'] # Replace with your classes
# Helper function (replace with your actual post-processing function)
def yolo_pro_post_process(output_data, img_size, minimum_confidence_rating=0.1):
# Implement your post-processing logic here
# This is a placeholder. Adapt this to your model's needs
# Example:
boxes, classes, scores = [], [], []
# Assuming output_data is already processed (e.g., sigmoid applied)
# Example: Extract bounding boxes, classes, and scores from the output
return results # Return results (boxes, classes, scores)
# Helper function (replace with your actual draw function)
def draw(img, boxes, scores, classes):
# Implement your draw logic here
# This is a placeholder
# Example: Draw bounding boxes on the image
return img
if __name__ == '__main__':
# Create RKNN object
rknn = RKNN(verbose=True)
# pre-process config
print('--> Config model')
rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]], target_platform=PLATFORM)
print('done')
# Load ONNX model
print('--> Loading model')
ret = rknn.load_onnx(model=ONNX_MODEL)
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=QUANTIZE_ON, dataset=DATASET)
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
# Export RKNN model
print('--> Export rknn model')
ret = rknn.export_rknn(RKNN_MODEL)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
# Init runtime environment
print('--> Init runtime environment')
ret = rknn.init_runtime()
if ret != 0:
print('Init runtime environment failed!')
exit(ret)
print('done')
# Set inputs
IMG_DIR = './cat_front_face'
img_files = [f for f in os.listdir(IMG_DIR) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]
for img_file in img_files:
img_path = os.path.join(IMG_DIR, img_file)
img = cv2.imread(img_path)
if img is None:
print(f"Failed to read image: {img_path}")
continue
# img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
# Inference
print(f'--> Running model on {img_file}')
img2 = np.expand_dims(img, 0)
outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])
np.save(f'./onnx_yolo_pro_{img_file}.npy', outputs[0])
print('done')
# output_data = np.load(f'./onnx_yolo_pro_{img_file}.npy', allow_pickle=True)
results = yolo_pro_post_process(outputs[0], IMG_SIZE, minimum_confidence_rating=0.1)
if not results:
print('No detections')
boxes, classes, scores = [], [], []
else:
boxes, classes, scores = zip(*results)
print('Results:')
for box, cl, score in results:
print(f'Class: {CLASSES[cl]}, Score: {score}, Box: {box}')
img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
if boxes:
draw(img_1, boxes, scores, classes)
out_path = f'cat_front_face_results/result_{img_file}'
cv2.imwrite(out_path, img_1)
print(f'Save results to {out_path}!')
rknn.release()
Possible causes and where to focus
Here are some things I've been considering, and where you guys can hopefully shed some light:
-
Dequantization Differences: Since the dequantization is handled by the runtime on the device, I don't have direct access to the process to debug it. This is the most likely area of concern, because it is the most apparent difference. Is there a known issue with the
deqnt_affine_to_f32
function or related quantization settings, particularly for the RV1106? Does the simulator use a different dequantization method? -
Data Type Mismatch: Could there be a difference in how the data types are handled between the simulator and the device, affecting the scaling or interpretation of the coordinates? Are there specific data type considerations when working with bounding box coordinates in the RKNN environment, and how they are affected by the conversion process?
-
Quantization Settings: I've set
do_quantization=QUANTIZE_ON
. Are there specific quantization parameters that need careful tuning for the RV1106? How do the quantization settings impact the range of values after dequantization, especially in the context of bounding box coordinates? -
Post-Processing on the Device: Although I am confident in the post-processing steps, it is worth noting that the device is running the post-processing code. Although it is working in the simulator, a different runtime environment can cause unexpected behavior. I will make sure that all libraries are compatible, and that the device is operating with the correct versions.
What I've Tried
So far, I've tried these steps:
- Double-checking the model: Verify the model architecture and its outputs.
- Verified the code: Ensured that both the Python conversion and device-side code match.
- Examined Input: Made sure the input image data matches across both environments.
- Studied the documentation: Consulted the RKNN toolkit documentation and example code.
Conclusion and Next Steps
I'm a bit stumped right now. Has anyone encountered similar issues with bounding box coordinate ranges after converting and running models on the RV1106 or other Rockchip devices? Any insights into common pitfalls related to quantization, dequantization, or data type handling would be hugely appreciated. Any specific advice or troubleshooting tips would be fantastic.
Thanks in advance for any help you can provide!
For further reading and support, I'd recommend checking out the Rockchip official documentation as a good starting point for understanding their RKNN tool and the RV1106 platform. You can also check out their community forums to engage with other developers.