DriverTrac/docs/BUG_FIX_SUMMARY.md
2025-11-24 18:38:24 +05:30

117 lines
3.7 KiB
Markdown

# Bug Fix Summary - ONNX Input Shape Error
## The Exact Issue
### Error Message:
```
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT :
Got invalid dimensions for input: images for the following indices
index: 1 Got: 480 Expected: 3
index: 3 Got: 3 Expected: 640
```
### Root Cause
**Problem**: The YOLO ONNX model expects input in format `(batch, channels, height, width)` = `(1, 3, 640, 640)`, but the code was passing `(1, 480, 640, 3)`.
**What was happening:**
1. Frame was resized to `(640, 480)` → OpenCV shape: `(480, 640, 3)` (height, width, channels)
2. Code did `frame[None]` → Shape became `(1, 480, 640, 3)` (batch, height, width, channels)
3. ONNX model expected `(1, 3, 640, 640)` (batch, channels, height, width)
**The mismatch:**
- Position 1 (channels): Got 480, Expected 3
- Position 3 (width): Got 3, Expected 640
### Why This Happened
1. **Wrong resize dimensions**: YOLO needs square input (640x640), not rectangular (640x480)
2. **Wrong format**: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width)
3. **Missing transpose**: Need to convert from HWC to CHW format
## The Fix
### 1. Fixed Input Preprocessing
**Before:**
```python
def detect_objects(self, frame):
input_name = self.yolo_session.get_inputs()[0].name
inputs = {input_name: frame[None].astype(np.float32) / 255.0}
```
**After:**
```python
def detect_objects(self, frame):
# Resize to square for YOLO (640x640)
yolo_input = cv2.resize(frame, (640, 640))
# Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640)
yolo_input = yolo_input.transpose(2, 0, 1)
# Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640)
yolo_input = yolo_input[None].astype(np.float32) / 255.0
input_name = self.yolo_session.get_inputs()[0].name
inputs = {input_name: yolo_input}
```
### 2. Fixed Output Parsing
**Before:**
```python
# Incorrect - assumes (1, 8400, 84) format
bboxes = outputs[0][0, :, :4] # Wrong!
confs = outputs[0][0, :, 4] # Wrong!
classes = np.argmax(outputs[0][0, :, 5:], axis=1) # Wrong!
```
**After:**
```python
# Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections)
output = outputs[0] # Shape: (1, 84, 8400)
# Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4)
bboxes = output[0, :4, :].transpose() # (8400, 4) in xyxy format
# Extract class scores: features 4:84 -> (80, 8400)
class_scores = output[0, 4:, :] # (80, 8400)
# Get class indices and confidences
classes = np.argmax(class_scores, axis=0) # (8400,) class indices
confs = np.max(class_scores, axis=0) # (8400,) confidence scores
```
## YOLOv8 ONNX Output Format
YOLOv8 ONNX exports produce output with shape: `(1, 84, 8400)`
- **1**: Batch size
- **84**: Features per detection (4 bbox coords + 80 COCO classes)
- **8400**: Number of anchor points/detections
**Structure:**
- `output[0, 0:4, :]` = Bounding box coordinates (x, y, x, y) in xyxy format
- `output[0, 4:84, :]` = Class scores for 80 COCO classes
## Testing
After the fix, the application should:
1. ✅ Load models without errors
2. ✅ Process frames without ONNX shape errors
3. ✅ Detect objects correctly
4. ⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display
## Next Steps
1. **Test the fix**: Run `streamlit run track_drive.py` and verify no ONNX errors
2. **Bbox scaling**: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions
3. **Performance**: Monitor FPS and CPU usage
## Related Issues Fixed
- ✅ ONNX input shape mismatch
- ✅ YOLO output parsing corrected
- ✅ Frame preprocessing for YOLO standardized