Bug Fix Summary - ONNX Input Shape Error

The Exact Issue

Error Message:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : 
Got invalid dimensions for input: images for the following indices
 index: 1 Got: 480 Expected: 3
 index: 3 Got: 3 Expected: 640

Root Cause

Problem: The YOLO ONNX model expects input in format (batch, channels, height, width) = (1, 3, 640, 640), but the code was passing (1, 480, 640, 3).

What was happening:

Frame was resized to (640, 480) → OpenCV shape: (480, 640, 3) (height, width, channels)
Code did frame[None] → Shape became (1, 480, 640, 3) (batch, height, width, channels)
ONNX model expected (1, 3, 640, 640) (batch, channels, height, width)

The mismatch:

Position 1 (channels): Got 480, Expected 3
Position 3 (width): Got 3, Expected 640

Why This Happened

Wrong resize dimensions: YOLO needs square input (640x640), not rectangular (640x480)
Wrong format: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width)
Missing transpose: Need to convert from HWC to CHW format

The Fix

1. Fixed Input Preprocessing

Before:

def detect_objects(self, frame):
    input_name = self.yolo_session.get_inputs()[0].name
    inputs = {input_name: frame[None].astype(np.float32) / 255.0}

After:

def detect_objects(self, frame):
    # Resize to square for YOLO (640x640)
    yolo_input = cv2.resize(frame, (640, 640))
    
    # Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640)
    yolo_input = yolo_input.transpose(2, 0, 1)
    
    # Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640)
    yolo_input = yolo_input[None].astype(np.float32) / 255.0
    
    input_name = self.yolo_session.get_inputs()[0].name
    inputs = {input_name: yolo_input}

2. Fixed Output Parsing

Before:

# Incorrect - assumes (1, 8400, 84) format
bboxes = outputs[0][0, :, :4]  # Wrong!
confs = outputs[0][0, :, 4]    # Wrong!
classes = np.argmax(outputs[0][0, :, 5:], axis=1)  # Wrong!

After:

# Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections)
output = outputs[0]  # Shape: (1, 84, 8400)

# Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4)
bboxes = output[0, :4, :].transpose()  # (8400, 4) in xyxy format

# Extract class scores: features 4:84 -> (80, 8400)
class_scores = output[0, 4:, :]  # (80, 8400)

# Get class indices and confidences
classes = np.argmax(class_scores, axis=0)  # (8400,) class indices
confs = np.max(class_scores, axis=0)  # (8400,) confidence scores

YOLOv8 ONNX Output Format

YOLOv8 ONNX exports produce output with shape: (1, 84, 8400)

1: Batch size
84: Features per detection (4 bbox coords + 80 COCO classes)
8400: Number of anchor points/detections

Structure:

output[0, 0:4, :] = Bounding box coordinates (x, y, x, y) in xyxy format
output[0, 4:84, :] = Class scores for 80 COCO classes

Testing

After the fix, the application should:

✅ Load models without errors
✅ Process frames without ONNX shape errors
✅ Detect objects correctly
⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display

Next Steps

Test the fix: Run streamlit run track_drive.py and verify no ONNX errors
Bbox scaling: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions
Performance: Monitor FPS and CPU usage

✅ ONNX input shape mismatch
✅ YOLO output parsing corrected
✅ Frame preprocessing for YOLO standardized

3.7 KiB Raw Blame History