# Bug Fix Summary - ONNX Input Shape Error ## The Exact Issue ### Error Message: ``` onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: images for the following indices index: 1 Got: 480 Expected: 3 index: 3 Got: 3 Expected: 640 ``` ### Root Cause **Problem**: The YOLO ONNX model expects input in format `(batch, channels, height, width)` = `(1, 3, 640, 640)`, but the code was passing `(1, 480, 640, 3)`. **What was happening:** 1. Frame was resized to `(640, 480)` → OpenCV shape: `(480, 640, 3)` (height, width, channels) 2. Code did `frame[None]` → Shape became `(1, 480, 640, 3)` (batch, height, width, channels) 3. ONNX model expected `(1, 3, 640, 640)` (batch, channels, height, width) **The mismatch:** - Position 1 (channels): Got 480, Expected 3 - Position 3 (width): Got 3, Expected 640 ### Why This Happened 1. **Wrong resize dimensions**: YOLO needs square input (640x640), not rectangular (640x480) 2. **Wrong format**: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width) 3. **Missing transpose**: Need to convert from HWC to CHW format ## The Fix ### 1. Fixed Input Preprocessing **Before:** ```python def detect_objects(self, frame): input_name = self.yolo_session.get_inputs()[0].name inputs = {input_name: frame[None].astype(np.float32) / 255.0} ``` **After:** ```python def detect_objects(self, frame): # Resize to square for YOLO (640x640) yolo_input = cv2.resize(frame, (640, 640)) # Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640) yolo_input = yolo_input.transpose(2, 0, 1) # Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640) yolo_input = yolo_input[None].astype(np.float32) / 255.0 input_name = self.yolo_session.get_inputs()[0].name inputs = {input_name: yolo_input} ``` ### 2. Fixed Output Parsing **Before:** ```python # Incorrect - assumes (1, 8400, 84) format bboxes = outputs[0][0, :, :4] # Wrong! confs = outputs[0][0, :, 4] # Wrong! classes = np.argmax(outputs[0][0, :, 5:], axis=1) # Wrong! ``` **After:** ```python # Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections) output = outputs[0] # Shape: (1, 84, 8400) # Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4) bboxes = output[0, :4, :].transpose() # (8400, 4) in xyxy format # Extract class scores: features 4:84 -> (80, 8400) class_scores = output[0, 4:, :] # (80, 8400) # Get class indices and confidences classes = np.argmax(class_scores, axis=0) # (8400,) class indices confs = np.max(class_scores, axis=0) # (8400,) confidence scores ``` ## YOLOv8 ONNX Output Format YOLOv8 ONNX exports produce output with shape: `(1, 84, 8400)` - **1**: Batch size - **84**: Features per detection (4 bbox coords + 80 COCO classes) - **8400**: Number of anchor points/detections **Structure:** - `output[0, 0:4, :]` = Bounding box coordinates (x, y, x, y) in xyxy format - `output[0, 4:84, :]` = Class scores for 80 COCO classes ## Testing After the fix, the application should: 1. ✅ Load models without errors 2. ✅ Process frames without ONNX shape errors 3. ✅ Detect objects correctly 4. ⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display ## Next Steps 1. **Test the fix**: Run `streamlit run track_drive.py` and verify no ONNX errors 2. **Bbox scaling**: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions 3. **Performance**: Monitor FPS and CPU usage ## Related Issues Fixed - ✅ ONNX input shape mismatch - ✅ YOLO output parsing corrected - ✅ Frame preprocessing for YOLO standardized