14 KiB
DSMS/ADAS Visual Analysis - Comprehensive Assessment Report
Executive Summary
This report provides a systematic evaluation of the current Streamlit-based Driver State Monitoring System (DSMS) and Advanced Driver Assistance System (ADAS) implementation, with focus on optimizing for low-specification CPUs while maintaining high accuracy.
Current Status: ⚠️ Non-Functional - Missing 9/11 critical dependencies, multiple code bugs, and significant performance bottlenecks.
1. Assessment of Current Implementation
1.1 Code Structure Analysis
Strengths:
- ✅ Modular class-based design (
RealTimePredictor) - ✅ Streamlit caching enabled (
@st.cache_resource) - ✅ Frame skipping mechanism (
inference_skip: 3) - ✅ Logging infrastructure in place
- ✅ ONNX optimization mentioned for YOLO
Critical Issues Identified:
🔴 CRITICAL BUG #1: Incorrect Optical Flow API Usage
def optical_flow(self, prev_frame, curr_frame):
"""OpenCV flow for speed, braking, accel."""
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
magnitude = np.mean(np.sqrt(flow[0]**2 + flow[1]**2))
return magnitude
Problem: calcOpticalFlowPyrLK requires feature points as input, not full images. This will cause a runtime error.
Impact: ⚠️ CRITICAL - Will crash on execution
🔴 CRITICAL BUG #2: VideoMAE JIT Scripting Failure
processor = VideoMAEImageProcessor.from_pretrained(CONFIG['videomae_model'])
videomae = VideoMAEForVideoClassification.from_pretrained(CONFIG['videomae_model'])
videomae = torch.jit.script(videomae)
torch.jit.save(videomae, 'videomae_ts.pt')
videomae = torch.jit.load('videomae_ts.pt')
Problem: Transformer models cannot be JIT scripted directly. This will fail at runtime.
Impact: ⚠️ CRITICAL - Model loading will crash
🔴 CRITICAL BUG #3: ONNX Export on Every Load
yolo_base = YOLO(CONFIG['yolo_base'])
yolo_base.export(format='onnx', int8=True) # Quantize once
yolo_session = ort.InferenceSession('yolov8n.onnx')
Problem: ONNX export runs every time load_models() is called, even with caching. Should be conditional.
Impact: ⚠️ HIGH - Slow startup, unnecessary file I/O
🟡 PERFORMANCE ISSUE #1: Untrained Isolation Forest
iso_forest = IsolationForest(contamination=0.1, random_state=42)
Problem: Isolation Forest is instantiated but never trained. Will produce random predictions.
Impact: ⚠️ MEDIUM - Anomaly detection non-functional
🟡 PERFORMANCE ISSUE #2: Multiple Heavy Models Loaded Simultaneously
All models (YOLO, VideoMAE, MediaPipe, Roboflow, Isolation Forest) load at startup regardless of usage.
Impact: ⚠️ HIGH - Very slow startup, high memory usage
🟡 PERFORMANCE ISSUE #3: Redundant Color Conversions
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
And later:
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
Impact: ⚠️ MEDIUM - Unnecessary CPU cycles
🟡 PERFORMANCE ISSUE #4: VideoMAE Processing Every Frame
VideoMAE (large transformer) processes 8-frame sequences even when not needed.
Impact: ⚠️ HIGH - Major CPU bottleneck on low-spec hardware
🟡 PERFORMANCE ISSUE #5: No Model Quantization for VideoMAE
VideoMAE runs in FP32, consuming significant memory and compute.
Impact: ⚠️ HIGH - Not suitable for low-spec CPUs
🟡 PERFORMANCE ISSUE #6: Inefficient YOLO ONNX Parsing
bboxes = outputs[0][0, :, :4] # xyxy
confs = outputs[0][0, :, 4]
classes = np.argmax(outputs[0][0, :, 5:], axis=1) # COCO classes
high_conf = confs > CONFIG['conf_threshold']
return {'bboxes': bboxes[high_conf], 'confs': confs[high_conf], 'classes': classes[high_conf]}
Problem: Assumes incorrect ONNX output format. YOLOv8 ONNX outputs are different.
Impact: ⚠️ HIGH - Detection results will be incorrect
1.2 Dependency Status
Current Installation Status:
- ✅ numpy (1.26.4)
- ✅ yaml (6.0.1)
- ❌ streamlit - MISSING
- ❌ opencv-python - MISSING
- ❌ ultralytics - MISSING
- ❌ mediapipe - MISSING
- ❌ roboflow - MISSING
- ❌ scikit-learn - MISSING
- ❌ transformers - MISSING
- ❌ torch - MISSING
- ❌ onnxruntime - MISSING
Installation Required: 9 packages missing (~2GB download, ~5GB disk space)
1.3 Algorithm Analysis
Current Techniques:
- Object Detection: YOLOv8n (nano) - ✅ Good choice for low-spec
- Face Analysis: MediaPipe Face Mesh - ✅ Efficient, CPU-friendly
- Action Recognition: VideoMAE-base - ❌ Too heavy for low-spec CPUs
- Seatbelt Detection: Roboflow custom model - ⚠️ Unknown performance
- Optical Flow: Incorrect implementation - ❌ Will crash
- Anomaly Detection: Isolation Forest (untrained) - ❌ Non-functional
2. Evaluation Criteria
2.1 Success Metrics
Accuracy Targets:
- DSMS Alerts: >90% precision, >85% recall
- ADAS Alerts: >95% precision, >90% recall
- False Positive Rate: <5%
Performance Targets (Low-Spec CPU - 4 cores, 2GHz, 8GB RAM):
- Frame Processing: >10 FPS sustained
- Model Loading: <30 seconds
- Memory Usage: <4GB peak
- CPU Utilization: <80% average
- Latency: <100ms per frame (with skipping)
Resource Utilization:
- Model Size: <500MB total (quantized)
- Disk I/O: Minimal (cached models)
- Network: None after initial download
2.2 Open-Source Tool Evaluation
Current Tools:
| Tool | Status | CPU Efficiency | Accuracy | Recommendation |
|---|---|---|---|---|
| YOLOv8n | ✅ Good | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Keep - Optimize |
| MediaPipe | ✅ Good | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Keep |
| VideoMAE-base | ❌ Too Heavy | ⭐ | ⭐⭐⭐⭐⭐ | Replace |
| Roboflow API | ⚠️ Unknown | ⭐⭐⭐ | ⭐⭐⭐ | Evaluate |
| Isolation Forest | ⚠️ Untrained | ⭐⭐⭐⭐ | N/A | Fix |
3. Improvement Suggestions
3.1 Critical Bug Fixes (Priority 1)
Fix #1: Correct Optical Flow Implementation
Replace calcOpticalFlowPyrLK with calcOpticalFlowFarneback (dense flow) or implement proper Lucas-Kanade with feature detection.
Recommended: Use cv2.calcOpticalFlowFarneback for dense flow (simpler, faster).
Fix #2: Remove VideoMAE JIT Scripting
Replace with direct model loading or ONNX conversion if quantization needed.
Alternative: Use lighter action recognition (MediaPipe Pose + heuristics).
Fix #3: Conditional ONNX Export
Add file existence check before export.
Fix #4: Fix YOLO ONNX Output Parsing
Use Ultralytics built-in ONNX post-processing or correct output format.
3.2 Performance Optimizations (Priority 2)
Optimization #1: Replace VideoMAE with Lightweight Alternative
Options:
- Option A: MediaPipe Pose + Temporal Logic (yawn detection via mouth opening)
- Option B: Lightweight 2D CNN (MobileNet-based) for action classification
- Option C: Remove action recognition, use face analysis only
Recommendation: Option A - Zero additional model, uses existing MediaPipe.
Optimization #2: Lazy Model Loading
Implement: Load models only when needed, not all at startup.
Optimization #3: Model Quantization
- YOLO: ✅ Already ONNX INT8 (verify)
- VideoMAE: Convert to INT8 ONNX or remove
- MediaPipe: Already optimized
Optimization #4: Frame Processing Pipeline
- Cache color conversions
- Reduce resolution further (320x240 for face, 640x480 for objects)
- Process different regions at different rates
Optimization #5: Smart Frame Skipping
- Different skip rates for different models
- Face analysis: Every frame (fast)
- Object detection: Every 3rd frame
- Action recognition: Every 10th frame (if kept)
3.3 Algorithm Enhancements (Priority 3)
Enhancement #1: Train Isolation Forest
Collect normal driving features, train offline, save model.
Enhancement #2: Improve Distance Estimation
Use camera calibration or stereo vision for accurate distance.
Enhancement #3: Better PERCLOS Calculation
Use proper Eye Aspect Ratio (EAR) formula instead of simplified version.
Enhancement #4: Temporal Smoothing
Add moving average filters to reduce false positives.
4. Implementation Plan
Phase 1: Critical Fixes (Week 1)
Goal: Make code functional and runnable
-
Day 1-2: Fix Critical Bugs
- Fix optical flow implementation
- Remove VideoMAE JIT scripting
- Fix YOLO ONNX parsing
- Add conditional ONNX export
- Add error handling
-
Day 3-4: Dependency Setup
- Install all dependencies
- Test basic functionality
- Fix import errors
-
Day 5: Basic Testing
- Run with webcam/video file
- Verify no crashes
- Measure baseline performance
Phase 2: Performance Optimization (Week 2)
Goal: Achieve >10 FPS on low-spec CPU
-
Day 1-2: Replace VideoMAE
- Implement MediaPipe Pose-based action detection
- Remove VideoMAE dependencies
- Test accuracy vs. performance
-
Day 3: Optimize Processing Pipeline
- Implement multi-resolution processing
- Add frame caching
- Optimize color conversions
-
Day 4: Model Quantization
- Verify YOLO INT8 quantization
- Test accuracy retention
- Measure speedup
-
Day 5: Smart Frame Skipping
- Implement per-model skip rates
- Add temporal smoothing
- Benchmark performance
Phase 3: Accuracy Improvements (Week 3)
Goal: Achieve >90% accuracy targets
-
Day 1-2: Fix Detection Logic
- Train Isolation Forest
- Improve PERCLOS calculation
- Fix distance estimation
-
Day 3-4: Temporal Smoothing
- Add moving averages
- Implement state machines for alerts
- Reduce false positives
-
Day 5: Calibration Tools
- Add distance calibration
- Add speed calibration
- Create config file
Phase 4: Testing & Validation (Week 4)
Goal: Validate improvements
-
Day 1-2: Unit Tests
- Test each component
- Mock dependencies
- Verify edge cases
-
Day 3-4: Integration Tests
- Test full pipeline
- Measure metrics
- Compare before/after
-
Day 5: Documentation
- Update code comments
- Create user guide
- Document calibration
5. Testing and Validation Framework
5.1 Test Dataset Requirements
Required Test Videos:
- Normal driving (baseline)
- Drowsy driver (PERCLOS > threshold)
- Distracted driver (phone, looking away)
- No seatbelt scenarios
- FCW scenarios (approaching vehicle)
- LDW scenarios (lane departure)
- Mixed scenarios
Minimum: 10 videos, 30 seconds each, various lighting conditions
5.2 Metrics Collection
Performance Metrics:
metrics = {
'fps': float, # Frames per second
'latency_ms': float, # Per-frame latency
'memory_mb': float, # Peak memory usage
'cpu_percent': float, # Average CPU usage
'model_load_time': float # Startup time
}
Accuracy Metrics:
accuracy_metrics = {
'precision': float, # TP / (TP + FP)
'recall': float, # TP / (TP + FN)
'f1_score': float, # 2 * (precision * recall) / (precision + recall)
'false_positive_rate': float # FP / (FP + TN)
}
5.3 Testing Script Structure
# test_performance.py
def benchmark_inference():
"""Measure FPS, latency, memory"""
pass
def test_accuracy():
"""Run on test dataset, compute metrics"""
pass
def test_edge_cases():
"""Test with missing data, errors"""
pass
5.4 Success Criteria
Performance:
- ✅ FPS > 10 on target hardware
- ✅ Latency < 100ms per frame
- ✅ Memory < 4GB
- ✅ CPU < 80%
Accuracy:
- ✅ DSMS Precision > 90%
- ✅ DSMS Recall > 85%
- ✅ ADAS Precision > 95%
- ✅ FPR < 5%
6. Documentation Requirements
6.1 Code Documentation
Required:
- Docstrings for all functions/classes
- Type hints where applicable
- Inline comments for complex logic
- Algorithm references (papers, docs)
Template:
def function_name(param1: type, param2: type) -> return_type:
"""
Brief description.
Args:
param1: Description
param2: Description
Returns:
Description
Raises:
ExceptionType: When this happens
References:
- Paper/URL if applicable
"""
6.2 User Documentation
Required Sections:
-
Installation Guide
- System requirements
- Dependency installation
- Configuration setup
-
Usage Guide
- How to run the application
- Configuration options
- Calibration procedures
-
Troubleshooting
- Common issues
- Performance tuning
- Accuracy improvements
6.3 Technical Documentation
Required:
- Architecture diagram
- Model specifications
- Performance benchmarks
- Accuracy reports
7. Immediate Action Items
🔴 CRITICAL - Do First:
- Fix optical flow bug (will crash)
- Remove VideoMAE JIT scripting (will crash)
- Fix YOLO ONNX parsing (incorrect results)
- Install missing dependencies
🟡 HIGH PRIORITY - Do Next:
- Replace VideoMAE with lightweight alternative
- Add conditional ONNX export
- Implement proper error handling
- Train Isolation Forest
🟢 MEDIUM PRIORITY - Do Later:
- Optimize frame processing
- Add temporal smoothing
- Improve calibration
- Add comprehensive tests
8. Estimated Impact
After Fixes:
- Functionality: ✅ Code will run without crashes
- Performance: 🟡 5-8 FPS → 🟢 12-15 FPS (estimated)
- Memory: 🟡 6-8GB → 🟢 2-3GB (estimated)
- Accuracy: 🟡 Unknown → 🟢 >90% (with improvements)
Timeline: 4 weeks for full implementation Effort: ~160 hours (1 FTE month)
Conclusion
The current implementation has a solid foundation but requires significant fixes and optimizations to be production-ready, especially for low-specification CPUs. The proposed improvements will address critical bugs, reduce resource usage by ~60%, and improve accuracy through better algorithms and temporal smoothing.
Next Step: Begin Phase 1 - Critical Fixes