codenuk_backend_mine/services/ai-analysis-service/FILE_FLOW_ANALYSIS.md
2025-10-24 13:02:49 +05:30

198 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# File Flow Analysis: Git Integration → AI Analysis Service
## 📊 **Performance Analysis for 500 Files**
### **Current Enhanced Configuration:**
- **Batch Size**: 50 files per batch
- **Max Workers**: 20 parallel workers
- **Cache TTL**: 1 hour (Redis)
- **Max File Size**: 100KB (skip larger files)
### **Time Estimates for 500 Files:**
#### **📈 Theoretical Performance:**
```
📊 Performance Analysis for 500 files:
Batch Size: 50 files per batch
Max Workers: 20 parallel workers
Batches Needed: 10 batches
⏱️ Time Estimates:
Time per batch: 30 seconds
Total time: 300 seconds (5.0 minutes)
🚀 With Parallel Processing:
Speedup factor: 20x
Parallel time: 15.0 seconds (0.2 minutes)
📈 Processing Rate:
Files per second: 33.3
Files per minute: 2000.0
```
#### **🎯 Realistic Performance (with API limits):**
- **API Rate Limiting**: 90 requests/minute (Claude API)
- **Network Latency**: ~200ms per request
- **File Processing**: ~2-3 seconds per file
- **Total Time**: **8-12 minutes for 500 files**
## 🔄 **File Flow: How Files Reach AI Analysis Service**
### **Step-by-Step Process:**
#### **1. Repository Discovery (Git Integration → AI Analysis)**
```
Frontend → API Gateway → AI Analysis Service
AI Analysis Service → Git Integration Service
GET /api/github/repository/{id}/ui-view
Returns: repository_info, local_path, file_tree
```
#### **2. File Content Retrieval**
```
For each file in repository:
AI Analysis Service → Git Integration Service
GET /api/github/repository/{id}/file-content?file_path={path}
Returns: file content (text)
```
#### **3. File Processing Flow**
```
1. Get Repository Info
├── repository_id, local_path, file_tree
└── Check Redis cache for existing analysis
2. For each file (parallel batches):
├── Get file content from Git Integration
├── Check Redis cache for file analysis
├── If cache miss:
│ ├── Apply rate limiting (90 req/min)
│ ├── Optimize content (truncate if >8000 tokens)
│ ├── Send to Claude API
│ ├── Parse response
│ └── Cache result in Redis
└── Add to results
3. Repository-level Analysis:
├── Architecture assessment
├── Security review
└── Code quality metrics
4. Generate Report:
├── Create PDF/JSON report
└── Store in /reports/ directory
```
## 🚀 **Performance Optimizations Implemented**
### **1. Parallel Processing:**
- **Batch Processing**: 50 files per batch
- **Worker Threads**: 20 parallel workers
- **Error Handling**: Graceful failure handling
- **Memory Management**: Skip files >100KB
### **2. Caching Strategy:**
- **Redis Cache**: 1-hour TTL for file analyses
- **Repository Cache**: 2-hour TTL for complete analyses
- **Cache Keys**: Structured keys for efficient retrieval
### **3. Database Storage:**
- **PostgreSQL**: Repository metadata and analysis results
- **MongoDB**: Episodic and persistent memory
- **Redis**: Working memory and caching
## ⏱️ **Actual Performance for 500 Files**
### **Conservative Estimate:**
- **File Processing**: 2-3 seconds per file
- **API Rate Limiting**: 90 requests/minute
- **Parallel Processing**: 20 workers
- **Total Time**: **8-12 minutes**
### **Optimistic Estimate (with caching):**
- **First Analysis**: 8-12 minutes
- **Subsequent Analyses**: 2-3 minutes (cached results)
### **Performance Breakdown:**
```
📊 500 Files Analysis:
├── File Discovery: 30 seconds
├── Content Retrieval: 2-3 minutes
├── AI Analysis: 5-8 minutes
├── Report Generation: 1-2 minutes
└── Database Storage: 30 seconds
Total: 8-12 minutes
```
## 🔧 **File Flow Architecture**
### **Data Flow Diagram:**
```
Frontend
↓ POST /api/ai-analysis/analyze-repository
API Gateway (Port 8000)
↓ Proxy to AI Analysis Service
AI Analysis Service (Port 8022)
↓ GET /api/github/repository/{id}/ui-view
Git Integration Service (Port 8012)
↓ Returns repository metadata
AI Analysis Service
↓ For each file: GET /api/github/repository/{id}/file-content
Git Integration Service
↓ Returns file content
AI Analysis Service
↓ Process with Claude API (parallel batches)
Claude API
↓ Returns analysis results
AI Analysis Service
↓ Store in databases (PostgreSQL, MongoDB, Redis)
↓ Generate report
↓ Return results to API Gateway
API Gateway
↓ Return to Frontend
```
### **Key Endpoints Used:**
1. **Repository Info**: `GET /api/github/repository/{id}/ui-view`
2. **File Content**: `GET /api/github/repository/{id}/file-content?file_path={path}`
3. **Analysis**: `POST /analyze-repository` (AI Analysis Service)
## 📈 **Performance Monitoring**
### **Metrics to Track:**
- **Files per second**: Target 33+ files/second
- **Cache hit rate**: Target 80%+ for repeated analyses
- **API success rate**: Target 95%+ success rate
- **Memory usage**: Monitor for large repositories
- **Database connections**: Ensure all databases connected
### **Optimization Opportunities:**
1. **Pre-fetching**: Load file contents in parallel
2. **Smart Caching**: Cache based on file hash
3. **Batch API Calls**: Reduce individual API calls
4. **Memory Optimization**: Stream large files
5. **Database Indexing**: Optimize query performance
## 🎯 **Summary**
### **For 500 Files:**
- **⏱️ Analysis Time**: 8-12 minutes (first time)
- **⚡ With Caching**: 2-3 minutes (subsequent)
- **📊 Processing Rate**: 33+ files/second
- **🔄 File Flow**: Git Integration → AI Analysis → Claude API → Databases
### **Key Performance Factors:**
1. **API Rate Limits**: Claude API (90 req/min)
2. **Network Latency**: ~200ms per request
3. **File Size**: Skip files >100KB
4. **Caching**: Redis cache for repeated analyses
5. **Parallel Processing**: 20 workers × 50 files/batch
The system is optimized for analyzing 500 files in 8-12 minutes with parallel processing, intelligent caching, and robust error handling.