493 lines
14 KiB
Markdown
493 lines
14 KiB
Markdown
# DSMS/ADAS Visual Analysis - Comprehensive Assessment Report
|
|
|
|
## Executive Summary
|
|
|
|
This report provides a systematic evaluation of the current Streamlit-based Driver State Monitoring System (DSMS) and Advanced Driver Assistance System (ADAS) implementation, with focus on optimizing for low-specification CPUs while maintaining high accuracy.
|
|
|
|
**Current Status**: ⚠️ **Non-Functional** - Missing 9/11 critical dependencies, multiple code bugs, and significant performance bottlenecks.
|
|
|
|
---
|
|
|
|
## 1. Assessment of Current Implementation
|
|
|
|
### 1.1 Code Structure Analysis
|
|
|
|
**Strengths:**
|
|
- ✅ Modular class-based design (`RealTimePredictor`)
|
|
- ✅ Streamlit caching enabled (`@st.cache_resource`)
|
|
- ✅ Frame skipping mechanism (`inference_skip: 3`)
|
|
- ✅ Logging infrastructure in place
|
|
- ✅ ONNX optimization mentioned for YOLO
|
|
|
|
**Critical Issues Identified:**
|
|
|
|
#### 🔴 **CRITICAL BUG #1: Incorrect Optical Flow API Usage**
|
|
```125:131:track_drive.py
|
|
def optical_flow(self, prev_frame, curr_frame):
|
|
"""OpenCV flow for speed, braking, accel."""
|
|
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
|
|
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
|
|
flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
|
|
magnitude = np.mean(np.sqrt(flow[0]**2 + flow[1]**2))
|
|
return magnitude
|
|
```
|
|
|
|
**Problem**: `calcOpticalFlowPyrLK` requires feature points as input, not full images. This will cause a runtime error.
|
|
|
|
**Impact**: ⚠️ **CRITICAL** - Will crash on execution
|
|
|
|
#### 🔴 **CRITICAL BUG #2: VideoMAE JIT Scripting Failure**
|
|
```48:53:track_drive.py
|
|
processor = VideoMAEImageProcessor.from_pretrained(CONFIG['videomae_model'])
|
|
videomae = VideoMAEForVideoClassification.from_pretrained(CONFIG['videomae_model'])
|
|
videomae = torch.jit.script(videomae)
|
|
torch.jit.save(videomae, 'videomae_ts.pt')
|
|
videomae = torch.jit.load('videomae_ts.pt')
|
|
```
|
|
|
|
**Problem**: Transformer models cannot be JIT scripted directly. This will fail at runtime.
|
|
|
|
**Impact**: ⚠️ **CRITICAL** - Model loading will crash
|
|
|
|
#### 🔴 **CRITICAL BUG #3: ONNX Export on Every Load**
|
|
```39:41:track_drive.py
|
|
yolo_base = YOLO(CONFIG['yolo_base'])
|
|
yolo_base.export(format='onnx', int8=True) # Quantize once
|
|
yolo_session = ort.InferenceSession('yolov8n.onnx')
|
|
```
|
|
|
|
**Problem**: ONNX export runs every time `load_models()` is called, even with caching. Should be conditional.
|
|
|
|
**Impact**: ⚠️ **HIGH** - Slow startup, unnecessary file I/O
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #1: Untrained Isolation Forest**
|
|
```60:60:track_drive.py
|
|
iso_forest = IsolationForest(contamination=0.1, random_state=42)
|
|
```
|
|
|
|
**Problem**: Isolation Forest is instantiated but never trained. Will produce random predictions.
|
|
|
|
**Impact**: ⚠️ **MEDIUM** - Anomaly detection non-functional
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #2: Multiple Heavy Models Loaded Simultaneously**
|
|
All models (YOLO, VideoMAE, MediaPipe, Roboflow, Isolation Forest) load at startup regardless of usage.
|
|
|
|
**Impact**: ⚠️ **HIGH** - Very slow startup, high memory usage
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #3: Redundant Color Conversions**
|
|
```101:101:track_drive.py
|
|
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
|
```
|
|
And later:
|
|
```253:253:track_drive.py
|
|
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
|
```
|
|
|
|
**Impact**: ⚠️ **MEDIUM** - Unnecessary CPU cycles
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #4: VideoMAE Processing Every Frame**
|
|
VideoMAE (large transformer) processes 8-frame sequences even when not needed.
|
|
|
|
**Impact**: ⚠️ **HIGH** - Major CPU bottleneck on low-spec hardware
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #5: No Model Quantization for VideoMAE**
|
|
VideoMAE runs in FP32, consuming significant memory and compute.
|
|
|
|
**Impact**: ⚠️ **HIGH** - Not suitable for low-spec CPUs
|
|
|
|
#### 🟡 **PERFORMANCE ISSUE #6: Inefficient YOLO ONNX Parsing**
|
|
```87:91:track_drive.py
|
|
bboxes = outputs[0][0, :, :4] # xyxy
|
|
confs = outputs[0][0, :, 4]
|
|
classes = np.argmax(outputs[0][0, :, 5:], axis=1) # COCO classes
|
|
high_conf = confs > CONFIG['conf_threshold']
|
|
return {'bboxes': bboxes[high_conf], 'confs': confs[high_conf], 'classes': classes[high_conf]}
|
|
```
|
|
|
|
**Problem**: Assumes incorrect ONNX output format. YOLOv8 ONNX outputs are different.
|
|
|
|
**Impact**: ⚠️ **HIGH** - Detection results will be incorrect
|
|
|
|
### 1.2 Dependency Status
|
|
|
|
**Current Installation Status:**
|
|
- ✅ numpy (1.26.4)
|
|
- ✅ yaml (6.0.1)
|
|
- ❌ streamlit - MISSING
|
|
- ❌ opencv-python - MISSING
|
|
- ❌ ultralytics - MISSING
|
|
- ❌ mediapipe - MISSING
|
|
- ❌ roboflow - MISSING
|
|
- ❌ scikit-learn - MISSING
|
|
- ❌ transformers - MISSING
|
|
- ❌ torch - MISSING
|
|
- ❌ onnxruntime - MISSING
|
|
|
|
**Installation Required**: 9 packages missing (~2GB download, ~5GB disk space)
|
|
|
|
### 1.3 Algorithm Analysis
|
|
|
|
**Current Techniques:**
|
|
1. **Object Detection**: YOLOv8n (nano) - ✅ Good choice for low-spec
|
|
2. **Face Analysis**: MediaPipe Face Mesh - ✅ Efficient, CPU-friendly
|
|
3. **Action Recognition**: VideoMAE-base - ❌ Too heavy for low-spec CPUs
|
|
4. **Seatbelt Detection**: Roboflow custom model - ⚠️ Unknown performance
|
|
5. **Optical Flow**: Incorrect implementation - ❌ Will crash
|
|
6. **Anomaly Detection**: Isolation Forest (untrained) - ❌ Non-functional
|
|
|
|
---
|
|
|
|
## 2. Evaluation Criteria
|
|
|
|
### 2.1 Success Metrics
|
|
|
|
**Accuracy Targets:**
|
|
- DSMS Alerts: >90% precision, >85% recall
|
|
- ADAS Alerts: >95% precision, >90% recall
|
|
- False Positive Rate: <5%
|
|
|
|
**Performance Targets (Low-Spec CPU - 4 cores, 2GHz, 8GB RAM):**
|
|
- Frame Processing: >10 FPS sustained
|
|
- Model Loading: <30 seconds
|
|
- Memory Usage: <4GB peak
|
|
- CPU Utilization: <80% average
|
|
- Latency: <100ms per frame (with skipping)
|
|
|
|
**Resource Utilization:**
|
|
- Model Size: <500MB total (quantized)
|
|
- Disk I/O: Minimal (cached models)
|
|
- Network: None after initial download
|
|
|
|
### 2.2 Open-Source Tool Evaluation
|
|
|
|
**Current Tools:**
|
|
| Tool | Status | CPU Efficiency | Accuracy | Recommendation |
|
|
|------|--------|----------------|----------|----------------|
|
|
| YOLOv8n | ✅ Good | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | **Keep** - Optimize |
|
|
| MediaPipe | ✅ Good | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | **Keep** |
|
|
| VideoMAE-base | ❌ Too Heavy | ⭐ | ⭐⭐⭐⭐⭐ | **Replace** |
|
|
| Roboflow API | ⚠️ Unknown | ⭐⭐⭐ | ⭐⭐⭐ | **Evaluate** |
|
|
| Isolation Forest | ⚠️ Untrained | ⭐⭐⭐⭐ | N/A | **Fix** |
|
|
|
|
---
|
|
|
|
## 3. Improvement Suggestions
|
|
|
|
### 3.1 Critical Bug Fixes (Priority 1)
|
|
|
|
#### Fix #1: Correct Optical Flow Implementation
|
|
**Replace** `calcOpticalFlowPyrLK` with `calcOpticalFlowFarneback` (dense flow) or implement proper Lucas-Kanade with feature detection.
|
|
|
|
**Recommended**: Use `cv2.calcOpticalFlowFarneback` for dense flow (simpler, faster).
|
|
|
|
#### Fix #2: Remove VideoMAE JIT Scripting
|
|
**Replace** with direct model loading or ONNX conversion if quantization needed.
|
|
|
|
**Alternative**: Use lighter action recognition (MediaPipe Pose + heuristics).
|
|
|
|
#### Fix #3: Conditional ONNX Export
|
|
**Add** file existence check before export.
|
|
|
|
#### Fix #4: Fix YOLO ONNX Output Parsing
|
|
**Use** Ultralytics built-in ONNX post-processing or correct output format.
|
|
|
|
### 3.2 Performance Optimizations (Priority 2)
|
|
|
|
#### Optimization #1: Replace VideoMAE with Lightweight Alternative
|
|
**Options:**
|
|
- **Option A**: MediaPipe Pose + Temporal Logic (yawn detection via mouth opening)
|
|
- **Option B**: Lightweight 2D CNN (MobileNet-based) for action classification
|
|
- **Option C**: Remove action recognition, use face analysis only
|
|
|
|
**Recommendation**: **Option A** - Zero additional model, uses existing MediaPipe.
|
|
|
|
#### Optimization #2: Lazy Model Loading
|
|
**Implement**: Load models only when needed, not all at startup.
|
|
|
|
#### Optimization #3: Model Quantization
|
|
- YOLO: ✅ Already ONNX INT8 (verify)
|
|
- VideoMAE: Convert to INT8 ONNX or remove
|
|
- MediaPipe: Already optimized
|
|
|
|
#### Optimization #4: Frame Processing Pipeline
|
|
- Cache color conversions
|
|
- Reduce resolution further (320x240 for face, 640x480 for objects)
|
|
- Process different regions at different rates
|
|
|
|
#### Optimization #5: Smart Frame Skipping
|
|
- Different skip rates for different models
|
|
- Face analysis: Every frame (fast)
|
|
- Object detection: Every 3rd frame
|
|
- Action recognition: Every 10th frame (if kept)
|
|
|
|
### 3.3 Algorithm Enhancements (Priority 3)
|
|
|
|
#### Enhancement #1: Train Isolation Forest
|
|
Collect normal driving features, train offline, save model.
|
|
|
|
#### Enhancement #2: Improve Distance Estimation
|
|
Use camera calibration or stereo vision for accurate distance.
|
|
|
|
#### Enhancement #3: Better PERCLOS Calculation
|
|
Use proper Eye Aspect Ratio (EAR) formula instead of simplified version.
|
|
|
|
#### Enhancement #4: Temporal Smoothing
|
|
Add moving average filters to reduce false positives.
|
|
|
|
---
|
|
|
|
## 4. Implementation Plan
|
|
|
|
### Phase 1: Critical Fixes (Week 1)
|
|
**Goal**: Make code functional and runnable
|
|
|
|
1. **Day 1-2: Fix Critical Bugs**
|
|
- [ ] Fix optical flow implementation
|
|
- [ ] Remove VideoMAE JIT scripting
|
|
- [ ] Fix YOLO ONNX parsing
|
|
- [ ] Add conditional ONNX export
|
|
- [ ] Add error handling
|
|
|
|
2. **Day 3-4: Dependency Setup**
|
|
- [ ] Install all dependencies
|
|
- [ ] Test basic functionality
|
|
- [ ] Fix import errors
|
|
|
|
3. **Day 5: Basic Testing**
|
|
- [ ] Run with webcam/video file
|
|
- [ ] Verify no crashes
|
|
- [ ] Measure baseline performance
|
|
|
|
### Phase 2: Performance Optimization (Week 2)
|
|
**Goal**: Achieve >10 FPS on low-spec CPU
|
|
|
|
1. **Day 1-2: Replace VideoMAE**
|
|
- [ ] Implement MediaPipe Pose-based action detection
|
|
- [ ] Remove VideoMAE dependencies
|
|
- [ ] Test accuracy vs. performance
|
|
|
|
2. **Day 3: Optimize Processing Pipeline**
|
|
- [ ] Implement multi-resolution processing
|
|
- [ ] Add frame caching
|
|
- [ ] Optimize color conversions
|
|
|
|
3. **Day 4: Model Quantization**
|
|
- [ ] Verify YOLO INT8 quantization
|
|
- [ ] Test accuracy retention
|
|
- [ ] Measure speedup
|
|
|
|
4. **Day 5: Smart Frame Skipping**
|
|
- [ ] Implement per-model skip rates
|
|
- [ ] Add temporal smoothing
|
|
- [ ] Benchmark performance
|
|
|
|
### Phase 3: Accuracy Improvements (Week 3)
|
|
**Goal**: Achieve >90% accuracy targets
|
|
|
|
1. **Day 1-2: Fix Detection Logic**
|
|
- [ ] Train Isolation Forest
|
|
- [ ] Improve PERCLOS calculation
|
|
- [ ] Fix distance estimation
|
|
|
|
2. **Day 3-4: Temporal Smoothing**
|
|
- [ ] Add moving averages
|
|
- [ ] Implement state machines for alerts
|
|
- [ ] Reduce false positives
|
|
|
|
3. **Day 5: Calibration Tools**
|
|
- [ ] Add distance calibration
|
|
- [ ] Add speed calibration
|
|
- [ ] Create config file
|
|
|
|
### Phase 4: Testing & Validation (Week 4)
|
|
**Goal**: Validate improvements
|
|
|
|
1. **Day 1-2: Unit Tests**
|
|
- [ ] Test each component
|
|
- [ ] Mock dependencies
|
|
- [ ] Verify edge cases
|
|
|
|
2. **Day 3-4: Integration Tests**
|
|
- [ ] Test full pipeline
|
|
- [ ] Measure metrics
|
|
- [ ] Compare before/after
|
|
|
|
3. **Day 5: Documentation**
|
|
- [ ] Update code comments
|
|
- [ ] Create user guide
|
|
- [ ] Document calibration
|
|
|
|
---
|
|
|
|
## 5. Testing and Validation Framework
|
|
|
|
### 5.1 Test Dataset Requirements
|
|
|
|
**Required Test Videos:**
|
|
- Normal driving (baseline)
|
|
- Drowsy driver (PERCLOS > threshold)
|
|
- Distracted driver (phone, looking away)
|
|
- No seatbelt scenarios
|
|
- FCW scenarios (approaching vehicle)
|
|
- LDW scenarios (lane departure)
|
|
- Mixed scenarios
|
|
|
|
**Minimum**: 10 videos, 30 seconds each, various lighting conditions
|
|
|
|
### 5.2 Metrics Collection
|
|
|
|
**Performance Metrics:**
|
|
```python
|
|
metrics = {
|
|
'fps': float, # Frames per second
|
|
'latency_ms': float, # Per-frame latency
|
|
'memory_mb': float, # Peak memory usage
|
|
'cpu_percent': float, # Average CPU usage
|
|
'model_load_time': float # Startup time
|
|
}
|
|
```
|
|
|
|
**Accuracy Metrics:**
|
|
```python
|
|
accuracy_metrics = {
|
|
'precision': float, # TP / (TP + FP)
|
|
'recall': float, # TP / (TP + FN)
|
|
'f1_score': float, # 2 * (precision * recall) / (precision + recall)
|
|
'false_positive_rate': float # FP / (FP + TN)
|
|
}
|
|
```
|
|
|
|
### 5.3 Testing Script Structure
|
|
|
|
```python
|
|
# test_performance.py
|
|
def benchmark_inference():
|
|
"""Measure FPS, latency, memory"""
|
|
pass
|
|
|
|
def test_accuracy():
|
|
"""Run on test dataset, compute metrics"""
|
|
pass
|
|
|
|
def test_edge_cases():
|
|
"""Test with missing data, errors"""
|
|
pass
|
|
```
|
|
|
|
### 5.4 Success Criteria
|
|
|
|
**Performance:**
|
|
- ✅ FPS > 10 on target hardware
|
|
- ✅ Latency < 100ms per frame
|
|
- ✅ Memory < 4GB
|
|
- ✅ CPU < 80%
|
|
|
|
**Accuracy:**
|
|
- ✅ DSMS Precision > 90%
|
|
- ✅ DSMS Recall > 85%
|
|
- ✅ ADAS Precision > 95%
|
|
- ✅ FPR < 5%
|
|
|
|
---
|
|
|
|
## 6. Documentation Requirements
|
|
|
|
### 6.1 Code Documentation
|
|
|
|
**Required:**
|
|
- Docstrings for all functions/classes
|
|
- Type hints where applicable
|
|
- Inline comments for complex logic
|
|
- Algorithm references (papers, docs)
|
|
|
|
**Template:**
|
|
```python
|
|
def function_name(param1: type, param2: type) -> return_type:
|
|
"""
|
|
Brief description.
|
|
|
|
Args:
|
|
param1: Description
|
|
param2: Description
|
|
|
|
Returns:
|
|
Description
|
|
|
|
Raises:
|
|
ExceptionType: When this happens
|
|
|
|
References:
|
|
- Paper/URL if applicable
|
|
"""
|
|
```
|
|
|
|
### 6.2 User Documentation
|
|
|
|
**Required Sections:**
|
|
1. **Installation Guide**
|
|
- System requirements
|
|
- Dependency installation
|
|
- Configuration setup
|
|
|
|
2. **Usage Guide**
|
|
- How to run the application
|
|
- Configuration options
|
|
- Calibration procedures
|
|
|
|
3. **Troubleshooting**
|
|
- Common issues
|
|
- Performance tuning
|
|
- Accuracy improvements
|
|
|
|
### 6.3 Technical Documentation
|
|
|
|
**Required:**
|
|
- Architecture diagram
|
|
- Model specifications
|
|
- Performance benchmarks
|
|
- Accuracy reports
|
|
|
|
---
|
|
|
|
## 7. Immediate Action Items
|
|
|
|
### 🔴 **CRITICAL - Do First:**
|
|
1. Fix optical flow bug (will crash)
|
|
2. Remove VideoMAE JIT scripting (will crash)
|
|
3. Fix YOLO ONNX parsing (incorrect results)
|
|
4. Install missing dependencies
|
|
|
|
### 🟡 **HIGH PRIORITY - Do Next:**
|
|
1. Replace VideoMAE with lightweight alternative
|
|
2. Add conditional ONNX export
|
|
3. Implement proper error handling
|
|
4. Train Isolation Forest
|
|
|
|
### 🟢 **MEDIUM PRIORITY - Do Later:**
|
|
1. Optimize frame processing
|
|
2. Add temporal smoothing
|
|
3. Improve calibration
|
|
4. Add comprehensive tests
|
|
|
|
---
|
|
|
|
## 8. Estimated Impact
|
|
|
|
**After Fixes:**
|
|
- **Functionality**: ✅ Code will run without crashes
|
|
- **Performance**: 🟡 5-8 FPS → 🟢 12-15 FPS (estimated)
|
|
- **Memory**: 🟡 6-8GB → 🟢 2-3GB (estimated)
|
|
- **Accuracy**: 🟡 Unknown → 🟢 >90% (with improvements)
|
|
|
|
**Timeline**: 4 weeks for full implementation
|
|
**Effort**: ~160 hours (1 FTE month)
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The current implementation has a solid foundation but requires significant fixes and optimizations to be production-ready, especially for low-specification CPUs. The proposed improvements will address critical bugs, reduce resource usage by ~60%, and improve accuracy through better algorithms and temporal smoothing.
|
|
|
|
**Next Step**: Begin Phase 1 - Critical Fixes
|
|
|