3.7 KiB
3.7 KiB
Bug Fix Summary - ONNX Input Shape Error
The Exact Issue
Error Message:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT :
Got invalid dimensions for input: images for the following indices
index: 1 Got: 480 Expected: 3
index: 3 Got: 3 Expected: 640
Root Cause
Problem: The YOLO ONNX model expects input in format (batch, channels, height, width) = (1, 3, 640, 640), but the code was passing (1, 480, 640, 3).
What was happening:
- Frame was resized to
(640, 480)→ OpenCV shape:(480, 640, 3)(height, width, channels) - Code did
frame[None]→ Shape became(1, 480, 640, 3)(batch, height, width, channels) - ONNX model expected
(1, 3, 640, 640)(batch, channels, height, width)
The mismatch:
- Position 1 (channels): Got 480, Expected 3
- Position 3 (width): Got 3, Expected 640
Why This Happened
- Wrong resize dimensions: YOLO needs square input (640x640), not rectangular (640x480)
- Wrong format: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width)
- Missing transpose: Need to convert from HWC to CHW format
The Fix
1. Fixed Input Preprocessing
Before:
def detect_objects(self, frame):
input_name = self.yolo_session.get_inputs()[0].name
inputs = {input_name: frame[None].astype(np.float32) / 255.0}
After:
def detect_objects(self, frame):
# Resize to square for YOLO (640x640)
yolo_input = cv2.resize(frame, (640, 640))
# Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640)
yolo_input = yolo_input.transpose(2, 0, 1)
# Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640)
yolo_input = yolo_input[None].astype(np.float32) / 255.0
input_name = self.yolo_session.get_inputs()[0].name
inputs = {input_name: yolo_input}
2. Fixed Output Parsing
Before:
# Incorrect - assumes (1, 8400, 84) format
bboxes = outputs[0][0, :, :4] # Wrong!
confs = outputs[0][0, :, 4] # Wrong!
classes = np.argmax(outputs[0][0, :, 5:], axis=1) # Wrong!
After:
# Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections)
output = outputs[0] # Shape: (1, 84, 8400)
# Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4)
bboxes = output[0, :4, :].transpose() # (8400, 4) in xyxy format
# Extract class scores: features 4:84 -> (80, 8400)
class_scores = output[0, 4:, :] # (80, 8400)
# Get class indices and confidences
classes = np.argmax(class_scores, axis=0) # (8400,) class indices
confs = np.max(class_scores, axis=0) # (8400,) confidence scores
YOLOv8 ONNX Output Format
YOLOv8 ONNX exports produce output with shape: (1, 84, 8400)
- 1: Batch size
- 84: Features per detection (4 bbox coords + 80 COCO classes)
- 8400: Number of anchor points/detections
Structure:
output[0, 0:4, :]= Bounding box coordinates (x, y, x, y) in xyxy formatoutput[0, 4:84, :]= Class scores for 80 COCO classes
Testing
After the fix, the application should:
- ✅ Load models without errors
- ✅ Process frames without ONNX shape errors
- ✅ Detect objects correctly
- ⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display
Next Steps
- Test the fix: Run
streamlit run track_drive.pyand verify no ONNX errors - Bbox scaling: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions
- Performance: Monitor FPS and CPU usage
Related Issues Fixed
- ✅ ONNX input shape mismatch
- ✅ YOLO output parsing corrected
- ✅ Frame preprocessing for YOLO standardized