DriverTrac/docs/BUG_FIX_SUMMARY.md

# Bug Fix Summary - ONNX Input Shape Error

## The Exact Issue

### Error Message:
```
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT :
Got invalid dimensions for input: images for the following indices
 index: 1 Got: 480 Expected: 3
 index: 3 Got: 3 Expected: 640
```

### Root Cause

**Problem**: The YOLO ONNX model expects input in format `(batch, channels, height, width)` = `(1, 3, 640, 640)`, but the code was passing `(1, 480, 640, 3)`.

**What was happening:**
1. Frame was resized to `(640, 480)` → OpenCV shape: `(480, 640, 3)` (height, width, channels)
2. Code did `frame[None]` → Shape became `(1, 480, 640, 3)` (batch, height, width, channels)
3. ONNX model expected `(1, 3, 640, 640)` (batch, channels, height, width)

**The mismatch:**
- Position 1 (channels): Got 480, Expected 3
- Position 3 (width): Got 3, Expected 640

### Why This Happened

1. **Wrong resize dimensions**: YOLO needs square input (640x640), not rectangular (640x480)
2. **Wrong format**: OpenCV uses HWC (Height, Width, Channels), but ONNX expects CHW (Channels, Height, Width)
3. **Missing transpose**: Need to convert from HWC to CHW format

## The Fix

### 1. Fixed Input Preprocessing

**Before:**
```python
def detect_objects(self, frame):
    input_name = self.yolo_session.get_inputs()[0].name
    inputs = {input_name: frame[None].astype(np.float32) / 255.0}
```

**After:**
```python
def detect_objects(self, frame):
    # Resize to square for YOLO (640x640)
    yolo_input = cv2.resize(frame, (640, 640))

    # Convert HWC to CHW: (640, 640, 3) -> (3, 640, 640)
    yolo_input = yolo_input.transpose(2, 0, 1)

    # Add batch dimension and normalize: (3, 640, 640) -> (1, 3, 640, 640)
    yolo_input = yolo_input[None].astype(np.float32) / 255.0

    input_name = self.yolo_session.get_inputs()[0].name
    inputs = {input_name: yolo_input}
```

### 2. Fixed Output Parsing

**Before:**
```python
# Incorrect - assumes (1, 8400, 84) format
bboxes = outputs[0][0, :, :4]  # Wrong!
confs = outputs[0][0, :, 4]    # Wrong!
classes = np.argmax(outputs[0][0, :, 5:], axis=1)  # Wrong!
```

**After:**
```python
# Correct - YOLOv8 ONNX output: (1, 84, 8400) = (batch, features, detections)
output = outputs[0]  # Shape: (1, 84, 8400)

# Extract bboxes: first 4 features -> (4, 8400) -> transpose to (8400, 4)
bboxes = output[0, :4, :].transpose()  # (8400, 4) in xyxy format

# Extract class scores: features 4:84 -> (80, 8400)
class_scores = output[0, 4:, :]  # (80, 8400)

# Get class indices and confidences
classes = np.argmax(class_scores, axis=0)  # (8400,) class indices
confs = np.max(class_scores, axis=0)  # (8400,) confidence scores
```

## YOLOv8 ONNX Output Format

YOLOv8 ONNX exports produce output with shape: `(1, 84, 8400)`

- **1**: Batch size
- **84**: Features per detection (4 bbox coords + 80 COCO classes)
- **8400**: Number of anchor points/detections

**Structure:**
- `output[0, 0:4, :]` = Bounding box coordinates (x, y, x, y) in xyxy format
- `output[0, 4:84, :]` = Class scores for 80 COCO classes

## Testing

After the fix, the application should:
1. ✅ Load models without errors
2. ✅ Process frames without ONNX shape errors
3. ✅ Detect objects correctly
4. ⚠️ Note: Bounding boxes are in 640x640 coordinate space - may need scaling for display

## Next Steps

1. **Test the fix**: Run `streamlit run track_drive.py` and verify no ONNX errors
2. **Bbox scaling**: If displaying on original frame size, scale bboxes from 640x640 to original frame dimensions
3. **Performance**: Monitor FPS and CPU usage

## Related Issues Fixed

- ✅ ONNX input shape mismatch
- ✅ YOLO output parsing corrected
- ✅ Frame preprocessing for YOLO standardized