117 lines
3.7 KiB
Markdown
117 lines
3.7 KiB
Markdown
# Bug Fix Summary - ONNX Input Shape Error
|
|
|
|
## The Exact Issue
|
|
|
|
### Error Message:
|
|
```
|
|
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT :
|
|
Got invalid dimensions for input: images for the following indices
|
|
index: 1 Got: 480 Expected: 3
|
|
index: 3 Got: 3 Expected: 640
|
|
```
|
|
|
|
### Root Cause
|
|
|
|
**Problem**: The YOLO ONNX model expects input in format `(batch, channels, height, width)` = `(1, 3, 640, 640)`, but the code was passing `(1, 480, 640, 3)`.
|
|
|
|
**What was happening:**
|
|
1. Frame was resized to `(640, 480)` → OpenCV shape: `(480, 640, 3)` (height, width, channels)
|
|
2. Code did `frame[None]` → Shape became `(1, 480, 640, 3)` (batch, height, width, channels)
|
|
3. ONNX model expected `(1, 3, 640, 640)` (batch, channels, height, width)
|
|
|
|
**The mismatch:**
|
|
- Position 1 (channels): Got 480, Expected 3
|
|
- Position 3 (width): Got 3, Expected 640
|
|
|
|
### Why This Happened
|
|
|
|
1. **Wrong resize dimensions**: YOLO needs square input (640x640), not rectangular (640x480)
|
|
2. **Wrong format**: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width)
|
|
3. **Missing transpose**: Need to convert from HWC to CHW format
|
|
|
|
## The Fix
|
|
|
|
### 1. Fixed Input Preprocessing
|
|
|
|
**Before:**
|
|
```python
|
|
def detect_objects(self, frame):
|
|
input_name = self.yolo_session.get_inputs()[0].name
|
|
inputs = {input_name: frame[None].astype(np.float32) / 255.0}
|
|
```
|
|
|
|
**After:**
|
|
```python
|
|
def detect_objects(self, frame):
|
|
# Resize to square for YOLO (640x640)
|
|
yolo_input = cv2.resize(frame, (640, 640))
|
|
|
|
# Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640)
|
|
yolo_input = yolo_input.transpose(2, 0, 1)
|
|
|
|
# Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640)
|
|
yolo_input = yolo_input[None].astype(np.float32) / 255.0
|
|
|
|
input_name = self.yolo_session.get_inputs()[0].name
|
|
inputs = {input_name: yolo_input}
|
|
```
|
|
|
|
### 2. Fixed Output Parsing
|
|
|
|
**Before:**
|
|
```python
|
|
# Incorrect - assumes (1, 8400, 84) format
|
|
bboxes = outputs[0][0, :, :4] # Wrong!
|
|
confs = outputs[0][0, :, 4] # Wrong!
|
|
classes = np.argmax(outputs[0][0, :, 5:], axis=1) # Wrong!
|
|
```
|
|
|
|
**After:**
|
|
```python
|
|
# Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections)
|
|
output = outputs[0] # Shape: (1, 84, 8400)
|
|
|
|
# Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4)
|
|
bboxes = output[0, :4, :].transpose() # (8400, 4) in xyxy format
|
|
|
|
# Extract class scores: features 4:84 -> (80, 8400)
|
|
class_scores = output[0, 4:, :] # (80, 8400)
|
|
|
|
# Get class indices and confidences
|
|
classes = np.argmax(class_scores, axis=0) # (8400,) class indices
|
|
confs = np.max(class_scores, axis=0) # (8400,) confidence scores
|
|
```
|
|
|
|
## YOLOv8 ONNX Output Format
|
|
|
|
YOLOv8 ONNX exports produce output with shape: `(1, 84, 8400)`
|
|
|
|
- **1**: Batch size
|
|
- **84**: Features per detection (4 bbox coords + 80 COCO classes)
|
|
- **8400**: Number of anchor points/detections
|
|
|
|
**Structure:**
|
|
- `output[0, 0:4, :]` = Bounding box coordinates (x, y, x, y) in xyxy format
|
|
- `output[0, 4:84, :]` = Class scores for 80 COCO classes
|
|
|
|
## Testing
|
|
|
|
After the fix, the application should:
|
|
1. ✅ Load models without errors
|
|
2. ✅ Process frames without ONNX shape errors
|
|
3. ✅ Detect objects correctly
|
|
4. ⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display
|
|
|
|
## Next Steps
|
|
|
|
1. **Test the fix**: Run `streamlit run track_drive.py` and verify no ONNX errors
|
|
2. **Bbox scaling**: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions
|
|
3. **Performance**: Monitor FPS and CPU usage
|
|
|
|
## Related Issues Fixed
|
|
|
|
- ✅ ONNX input shape mismatch
|
|
- ✅ YOLO output parsing corrected
|
|
- ✅ Frame preprocessing for YOLO standardized
|
|
|