2025-11-24 18:38:24 +05:30

14 KiB

Raw Permalink Blame History

DSMS/ADAS Visual Analysis - Comprehensive Assessment Report

Executive Summary

This report provides a systematic evaluation of the current Streamlit-based Driver State Monitoring System (DSMS) and Advanced Driver Assistance System (ADAS) implementation, with focus on optimizing for low-specification CPUs while maintaining high accuracy.

Current Status: ⚠️ Non-Functional - Missing 9/11 critical dependencies, multiple code bugs, and significant performance bottlenecks.

1. Assessment of Current Implementation

1.1 Code Structure Analysis

Strengths:

✅ Modular class-based design (RealTimePredictor)
✅ Streamlit caching enabled (@st.cache_resource)
✅ Frame skipping mechanism (inference_skip: 3)
✅ Logging infrastructure in place
✅ ONNX optimization mentioned for YOLO

Critical Issues Identified:

🔴 CRITICAL BUG #1: Incorrect Optical Flow API Usage

def optical_flow(self, prev_frame, curr_frame):
    """OpenCV flow for speed, braking, accel."""
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
    magnitude = np.mean(np.sqrt(flow[0]**2 + flow[1]**2))
    return magnitude

Problem: calcOpticalFlowPyrLK requires feature points as input, not full images. This will cause a runtime error.

Impact: ⚠️ CRITICAL - Will crash on execution

🔴 CRITICAL BUG #2: VideoMAE JIT Scripting Failure

processor = VideoMAEImageProcessor.from_pretrained(CONFIG['videomae_model'])
videomae = VideoMAEForVideoClassification.from_pretrained(CONFIG['videomae_model'])
videomae = torch.jit.script(videomae)
torch.jit.save(videomae, 'videomae_ts.pt')
videomae = torch.jit.load('videomae_ts.pt')

Problem: Transformer models cannot be JIT scripted directly. This will fail at runtime.

Impact: ⚠️ CRITICAL - Model loading will crash

🔴 CRITICAL BUG #3: ONNX Export on Every Load

yolo_base = YOLO(CONFIG['yolo_base'])
yolo_base.export(format='onnx', int8=True)  # Quantize once
yolo_session = ort.InferenceSession('yolov8n.onnx')

Problem: ONNX export runs every time load_models() is called, even with caching. Should be conditional.

Impact: ⚠️ HIGH - Slow startup, unnecessary file I/O

🟡 PERFORMANCE ISSUE #1: Untrained Isolation Forest

iso_forest = IsolationForest(contamination=0.1, random_state=42)

Problem: Isolation Forest is instantiated but never trained. Will produce random predictions.

Impact: ⚠️ MEDIUM - Anomaly detection non-functional

🟡 PERFORMANCE ISSUE #2: Multiple Heavy Models Loaded Simultaneously

All models (YOLO, VideoMAE, MediaPipe, Roboflow, Isolation Forest) load at startup regardless of usage.

Impact: ⚠️ HIGH - Very slow startup, high memory usage

🟡 PERFORMANCE ISSUE #3: Redundant Color Conversions

rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

And later:

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

Impact: ⚠️ MEDIUM - Unnecessary CPU cycles

🟡 PERFORMANCE ISSUE #4: VideoMAE Processing Every Frame

VideoMAE (large transformer) processes 8-frame sequences even when not needed.

Impact: ⚠️ HIGH - Major CPU bottleneck on low-spec hardware

🟡 PERFORMANCE ISSUE #5: No Model Quantization for VideoMAE

VideoMAE runs in FP32, consuming significant memory and compute.

Impact: ⚠️ HIGH - Not suitable for low-spec CPUs

🟡 PERFORMANCE ISSUE #6: Inefficient YOLO ONNX Parsing

bboxes = outputs[0][0, :, :4]  # xyxy
confs = outputs[0][0, :, 4]
classes = np.argmax(outputs[0][0, :, 5:], axis=1)  # COCO classes
high_conf = confs > CONFIG['conf_threshold']
return {'bboxes': bboxes[high_conf], 'confs': confs[high_conf], 'classes': classes[high_conf]}

Problem: Assumes incorrect ONNX output format. YOLOv8 ONNX outputs are different.

Impact: ⚠️ HIGH - Detection results will be incorrect

1.2 Dependency Status

Current Installation Status:

✅ numpy (1.26.4)
✅ yaml (6.0.1)
❌ streamlit - MISSING
❌ opencv-python - MISSING
❌ ultralytics - MISSING
❌ mediapipe - MISSING
❌ roboflow - MISSING
❌ scikit-learn - MISSING
❌ transformers - MISSING
❌ torch - MISSING
❌ onnxruntime - MISSING

Installation Required: 9 packages missing (~2GB download, ~5GB disk space)

1.3 Algorithm Analysis

Current Techniques:

Object Detection: YOLOv8n (nano) - ✅ Good choice for low-spec
Face Analysis: MediaPipe Face Mesh - ✅ Efficient, CPU-friendly
Action Recognition: VideoMAE-base - ❌ Too heavy for low-spec CPUs
Seatbelt Detection: Roboflow custom model - ⚠️ Unknown performance
Optical Flow: Incorrect implementation - ❌ Will crash
Anomaly Detection: Isolation Forest (untrained) - ❌ Non-functional

2. Evaluation Criteria

2.1 Success Metrics

Accuracy Targets:

DSMS Alerts: >90% precision, >85% recall
ADAS Alerts: >95% precision, >90% recall
False Positive Rate: <5%

Performance Targets (Low-Spec CPU - 4 cores, 2GHz, 8GB RAM):

Frame Processing: >10 FPS sustained
Model Loading: <30 seconds
Memory Usage: <4GB peak
CPU Utilization: <80% average
Latency: <100ms per frame (with skipping)

Resource Utilization:

Model Size: <500MB total (quantized)
Disk I/O: Minimal (cached models)
Network: None after initial download

2.2 Open-Source Tool Evaluation

Current Tools:

Tool	Status	CPU Efficiency	Accuracy	Recommendation
YOLOv8n	✅ Good	⭐⭐⭐⭐	⭐⭐⭐⭐	Keep - Optimize
MediaPipe	✅ Good	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Keep
VideoMAE-base	❌ Too Heavy	⭐	⭐⭐⭐⭐⭐	Replace
Roboflow API	⚠️ Unknown	⭐⭐⭐	⭐⭐⭐	Evaluate
Isolation Forest	⚠️ Untrained	⭐⭐⭐⭐	N/A	Fix

3. Improvement Suggestions

3.1 Critical Bug Fixes (Priority 1)

Fix #1: Correct Optical Flow Implementation

Replace calcOpticalFlowPyrLK with calcOpticalFlowFarneback (dense flow) or implement proper Lucas-Kanade with feature detection.

Recommended: Use cv2.calcOpticalFlowFarneback for dense flow (simpler, faster).

Fix #2: Remove VideoMAE JIT Scripting

Replace with direct model loading or ONNX conversion if quantization needed.

Alternative: Use lighter action recognition (MediaPipe Pose + heuristics).

Fix #3: Conditional ONNX Export

Add file existence check before export.

Fix #4: Fix YOLO ONNX Output Parsing

Use Ultralytics built-in ONNX post-processing or correct output format.

3.2 Performance Optimizations (Priority 2)

Optimization #1: Replace VideoMAE with Lightweight Alternative

Options:

Option A: MediaPipe Pose + Temporal Logic (yawn detection via mouth opening)
Option B: Lightweight 2D CNN (MobileNet-based) for action classification
Option C: Remove action recognition, use face analysis only

Recommendation: Option A - Zero additional model, uses existing MediaPipe.

Optimization #2: Lazy Model Loading

Implement: Load models only when needed, not all at startup.

Optimization #3: Model Quantization

YOLO: ✅ Already ONNX INT8 (verify)
VideoMAE: Convert to INT8 ONNX or remove
MediaPipe: Already optimized

Optimization #4: Frame Processing Pipeline

Cache color conversions
Reduce resolution further (320x240 for face, 640x480 for objects)
Process different regions at different rates

Optimization #5: Smart Frame Skipping

Different skip rates for different models
Face analysis: Every frame (fast)
Object detection: Every 3rd frame
Action recognition: Every 10th frame (if kept)

3.3 Algorithm Enhancements (Priority 3)

Enhancement #1: Train Isolation Forest

Collect normal driving features, train offline, save model.

Enhancement #2: Improve Distance Estimation

Use camera calibration or stereo vision for accurate distance.

Enhancement #3: Better PERCLOS Calculation

Use proper Eye Aspect Ratio (EAR) formula instead of simplified version.

Enhancement #4: Temporal Smoothing

Add moving average filters to reduce false positives.

4. Implementation Plan

Phase 1: Critical Fixes (Week 1)

Goal: Make code functional and runnable

Day 1-2: Fix Critical Bugs
- Fix optical flow implementation
- Remove VideoMAE JIT scripting
- Fix YOLO ONNX parsing
- Add conditional ONNX export
- Add error handling
Day 3-4: Dependency Setup
- Install all dependencies
- Test basic functionality
- Fix import errors
Day 5: Basic Testing
- Run with webcam/video file
- Verify no crashes
- Measure baseline performance

Phase 2: Performance Optimization (Week 2)

Goal: Achieve >10 FPS on low-spec CPU

Day 1-2: Replace VideoMAE
- Implement MediaPipe Pose-based action detection
- Remove VideoMAE dependencies
- Test accuracy vs. performance
Day 3: Optimize Processing Pipeline
- Implement multi-resolution processing
- Add frame caching
- Optimize color conversions
Day 4: Model Quantization
- Verify YOLO INT8 quantization
- Test accuracy retention
- Measure speedup
Day 5: Smart Frame Skipping
- Implement per-model skip rates
- Add temporal smoothing
- Benchmark performance

Phase 3: Accuracy Improvements (Week 3)

Goal: Achieve >90% accuracy targets

Day 1-2: Fix Detection Logic
- Train Isolation Forest
- Improve PERCLOS calculation
- Fix distance estimation
Day 3-4: Temporal Smoothing
- Add moving averages
- Implement state machines for alerts
- Reduce false positives
Day 5: Calibration Tools
- Add distance calibration
- Add speed calibration
- Create config file

Phase 4: Testing & Validation (Week 4)

Goal: Validate improvements

Day 1-2: Unit Tests
- Test each component
- Mock dependencies
- Verify edge cases
Day 3-4: Integration Tests
- Test full pipeline
- Measure metrics
- Compare before/after
Day 5: Documentation
- Update code comments
- Create user guide
- Document calibration

5. Testing and Validation Framework

5.1 Test Dataset Requirements

Required Test Videos:

Normal driving (baseline)
Drowsy driver (PERCLOS > threshold)
Distracted driver (phone, looking away)
No seatbelt scenarios
FCW scenarios (approaching vehicle)
LDW scenarios (lane departure)
Mixed scenarios

Minimum: 10 videos, 30 seconds each, various lighting conditions

5.2 Metrics Collection

Performance Metrics:

metrics = {
    'fps': float,           # Frames per second
    'latency_ms': float,    # Per-frame latency
    'memory_mb': float,     # Peak memory usage
    'cpu_percent': float,   # Average CPU usage
    'model_load_time': float  # Startup time
}

Accuracy Metrics:

accuracy_metrics = {
    'precision': float,     # TP / (TP + FP)
    'recall': float,        # TP / (TP + FN)
    'f1_score': float,      # 2 * (precision * recall) / (precision + recall)
    'false_positive_rate': float  # FP / (FP + TN)
}

5.3 Testing Script Structure

# test_performance.py
def benchmark_inference():
    """Measure FPS, latency, memory"""
    pass

def test_accuracy():
    """Run on test dataset, compute metrics"""
    pass

def test_edge_cases():
    """Test with missing data, errors"""
    pass

5.4 Success Criteria

Performance:

✅ FPS > 10 on target hardware
✅ Latency < 100ms per frame
✅ Memory < 4GB
✅ CPU < 80%

Accuracy:

✅ DSMS Precision > 90%
✅ DSMS Recall > 85%
✅ ADAS Precision > 95%
✅ FPR < 5%

6. Documentation Requirements

6.1 Code Documentation

Required:

Docstrings for all functions/classes
Type hints where applicable
Inline comments for complex logic
Algorithm references (papers, docs)

Template:

def function_name(param1: type, param2: type) -> return_type:
    """
    Brief description.
    
    Args:
        param1: Description
        param2: Description
    
    Returns:
        Description
    
    Raises:
        ExceptionType: When this happens
    
    References:
        - Paper/URL if applicable
    """

6.2 User Documentation

Required Sections:

Installation Guide
- System requirements
- Dependency installation
- Configuration setup
Usage Guide
- How to run the application
- Configuration options
- Calibration procedures
Troubleshooting
- Common issues
- Performance tuning
- Accuracy improvements

6.3 Technical Documentation