dld_backend/nodejs_chartjs_prompt.md
2025-10-30 12:13:02 +05:30

742 lines
22 KiB
Markdown

# Node.js Application Prompt: Dubai Land Department Analytics API
## Project Overview
Create a Node.js application that provides a REST API endpoint to process natural language queries about Dubai real estate data and return structured responses optimized for Chart.js visualization and frontend card displays.
## Technical Requirements
### Architecture
- **Clean Architecture**: Implement layered architecture with clear separation of concerns
- **Database**: MySQL with the provided Dubai Land Department schema
- **Response Format**: JSON responses compatible with Chart.js
- **Framework**: Express.js with TypeScript support
- **Database ORM**: Prisma or Sequelize for MySQL integration
### Core Functionality
#### 1. Natural Language Processing with Node.js NLP Libraries
- **Primary NLP Library**: Natural.js or Compromise.js for natural language understanding
- **Alternative Options**:
- **Natural.js**: Comprehensive NLP library with entity extraction, sentiment analysis, and classification
- **Compromise.js**: Lightweight, fast NLP with excellent entity recognition
- **Wink NLP**: Advanced NLP with custom entity recognition
- **Node-nlp**: Microsoft's NLP library with intent recognition
- Accept user queries in natural language
- Parse queries using Node.js NLP libraries to identify:
- Time periods (last 6 months, weekly, monthly)
- Geographic areas (Business Bay, specific zones)
- Property types (apartments, villas, commercial)
- Aggregation types (trends, averages, summaries)
- Transaction types (rental, sales, off-plan)
- Implement custom Named Entity Recognition (NER) for:
- Dubai area names
- Property type classification
- Time period extraction
- Metric type identification
- Use dependency parsing and tokenization for query structure analysis
#### 2. Query Processing Logic
The application should handle these specific query patterns:
**Single Value Queries** (Card Display):
- Calculate single metrics (averages, totals, counts)
- Return both descriptive text and executable SQL query
- Format results for frontend card components
**Multi-Query Scenarios** (Chart Visualization):
- Process complex queries requiring multiple data points
- Generate multiple SQL queries for comprehensive analysis
- Format results for Chart.js (line charts, bar charts, pie charts)
#### 3. Supported Query Types
**Rental Price Analysis:**
- "Give me the last 6 months rental price trend for Business Bay"
- "Summarise by week" (refinement of previous query)
- "Apartments only" (further refinement)
**Project Analysis:**
- "Brief about the Project" (transaction summary)
- "List of fast moving projects in last 6 months"
- "Which area is seeing uptick in off-plan projects in last 6 months"
**Area Performance:**
- "Which area is having more rental transactions?"
- "Top 5 areas for Commercial leasing and why?"
- "Top 5 areas for Residential leasing and why?"
- "Avg price of 3BHK apartment by area in last 6 months, group it by month. Show top 5 areas only."
## Database Schema Context
The application will work with the Dubai Land Department MySQL database containing:
### Core Tables:
- **transactions**: Real estate transaction records
- **rents**: Property rental contracts (Ejari system)
- **projects**: Development projects with developer relationships
- **developers**: Registered real estate developers
- **buildings**: Building registry information
- **lands**: Land registry information
- **valuations**: Official property valuations
- **brokers**: Registered real estate brokers
### Key Fields for Analysis:
- **Time Fields**: `instance_date`, `registration_date`, `start_date`, `end_date`
- **Location Fields**: `area_en`, `zone_en`, `nearest_metro_en`
- **Property Fields**: `prop_type_en`, `prop_sub_type_en`, `rooms_en`
- **Financial Fields**: `trans_value`, `contract_amount`, `annual_amount`
- **Project Fields**: `project_en`, `master_project_en`, `is_offplan_en`
## API Specification
### Endpoint Structure
```
POST /api/query
Content-Type: application/json
Request Body:
{
"query": "Give me the last 6 months rental price trend for Business Bay"
}
Response Format:
{
"success": true,
"data": {
"text": "Rental price trend for Business Bay over the last 6 months",
"visualizations": [
{
"type": "line",
"title": "Monthly Rental Price Trend",
"data": {
"labels": ["2024-01", "2024-02", "2024-03", "2024-04", "2024-05", "2024-06"],
"datasets": [{
"label": "Average Rental Price (AED)",
"data": [85000, 87000, 89000, 92000, 95000, 98000],
"borderColor": "rgb(75, 192, 192)",
"backgroundColor": "rgba(75, 192, 192, 0.2)"
}]
}
}
],
"cards": [
{
"title": "Average Price",
"value": "92,500 AED",
"subtitle": "Last 6 months",
"trend": "+15.3%"
}
],
"sql_queries": [
"SELECT DATE_FORMAT(start_date, '%Y-%m') as month, AVG(annual_amount) as avg_price FROM rents WHERE area_en = 'Business Bay' AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH) GROUP BY DATE_FORMAT(start_date, '%Y-%m') ORDER BY month"
]
}
}
```
## Implementation Requirements
### 1. Project Structure
```
src/
├── controllers/
│ └── queryController.js
├── services/
│ ├── nlpService.js # spaCy integration and NLP processing
│ ├── queryParser.js # Query parsing logic
│ ├── sqlGenerator.js # SQL query generation
│ ├── chartFormatter.js # Chart.js data formatting
│ └── contextManager.js # Context tracking for follow-up queries
├── models/
│ └── database.js
├── middleware/
│ └── validation.js
├── utils/
│ ├── dateUtils.js
│ └── textProcessor.js
├── config/
│ ├── nlpConfig.js # spaCy configuration and patterns
│ └── areaMapping.js # Dubai area name mappings
└── routes/
└── api.js
```
### 1.1. Required Dependencies
**Node.js Dependencies:**
```json
{
"dependencies": {
"express": "^4.18.2",
"mysql2": "^3.6.0",
"natural": "^6.5.0",
"compromise": "^14.10.0",
"wink-nlp": "^1.12.0",
"node-nlp": "^4.27.0",
"moment": "^2.29.4",
"dotenv": "^16.3.1",
"joi": "^17.11.0",
"redis": "^4.6.0"
},
"devDependencies": {
"@types/node": "^20.10.0",
"typescript": "^5.3.3",
"ts-node": "^10.9.1",
"nodemon": "^3.0.2"
}
}
```
**No Python Dependencies Required** - Pure Node.js implementation
### 2. Key Components
#### Query Parser Service (Node.js NLP-based)
- Load Natural.js or Compromise.js NLP library
- Extract time periods using custom regex patterns and date parsing
- Identify geographic areas using custom entity recognition
- Extract property types using keyword matching and classification
- Determine aggregation requirements using intent recognition
- Handle query refinements and follow-ups with context tracking
- Implement custom matcher patterns for Dubai-specific terminology
- Use tokenization and stemming for query normalization
#### SQL Generator Service
- Convert parsed queries to MySQL statements
- Handle date range calculations
- Implement proper JOIN operations for related tables
- Optimize queries for performance
#### Chart Formatter Service
- Convert SQL results to Chart.js compatible format
- Support multiple chart types (line, bar, pie, doughnut)
- Generate appropriate labels and datasets
- Handle data aggregation and grouping
### 3. Chart.js Compatibility
#### Line Charts (Trends)
```javascript
{
type: 'line',
data: {
labels: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
datasets: [{
label: 'Rental Price Trend',
data: [85000, 87000, 89000, 92000, 95000, 98000],
borderColor: 'rgb(75, 192, 192)',
backgroundColor: 'rgba(75, 192, 192, 0.2)'
}]
}
}
```
#### Bar Charts (Comparisons)
```javascript
{
type: 'bar',
data: {
labels: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'],
datasets: [{
label: 'Average Rental Price',
data: [92000, 85000, 78000, 95000, 110000],
backgroundColor: 'rgba(54, 162, 235, 0.6)'
}]
}
}
```
#### Pie Charts (Distribution)
```javascript
{
type: 'pie',
data: {
labels: ['Apartments', 'Villas', 'Commercial', 'Off-plan'],
datasets: [{
data: [45, 25, 20, 10],
backgroundColor: ['#FF6384', '#36A2EB', '#FFCE56', '#4BC0C0']
}]
}
}
```
### 4. Node.js NLP Implementation Details
#### Option 1: Natural.js Implementation
```javascript
// Install Natural.js
// npm install natural
const natural = require('natural');
const moment = require('moment');
// Initialize NLP components
const tokenizer = new natural.WordTokenizer();
const stemmer = natural.PorterStemmer;
// Custom entity recognition for Dubai real estate
class DubaiRealEstateParser {
constructor() {
this.areaPatterns = [
'Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC', 'JLT',
'Palm Jumeirah', 'Dubai Hills', 'Arabian Ranches', 'Jumeirah'
];
this.propertyTypes = [
'apartment', 'villa', 'commercial', 'office', 'retail', 'warehouse'
];
this.roomTypes = ['studio', '1BHK', '2BHK', '3BHK', '4BHK', '5BHK'];
this.intentKeywords = {
trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'],
compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'],
average: ['average', 'avg', 'mean', 'typical', 'median'],
summary: ['brief', 'summary', 'summarize', 'overview', 'overall'],
count: ['how many', 'count', 'number of', 'total']
};
}
async parseQuery(query) {
const tokens = tokenizer.tokenize(query.toLowerCase());
return {
time_period: this.extractTimePeriod(query),
areas: this.extractAreas(query),
property_types: this.extractPropertyTypes(query),
room_types: this.extractRoomTypes(query),
intent: this.classifyIntent(query),
tokens: tokens
};
}
extractTimePeriod(query) {
const timePatterns = [
{ pattern: /last (\d+) months?/i, value: 'months', count: 6 },
{ pattern: /last (\d+) weeks?/i, value: 'weeks', count: 4 },
{ pattern: /last (\d+) years?/i, value: 'years', count: 1 },
{ pattern: /(\d{4})/i, value: 'year', count: null }
];
for (const timePattern of timePatterns) {
const match = query.match(timePattern.pattern);
if (match) {
return {
type: timePattern.value,
count: match[1] ? parseInt(match[1]) : timePattern.count,
startDate: this.calculateStartDate(timePattern.value, match[1] || timePattern.count)
};
}
}
return null;
}
extractAreas(query) {
return this.areaPatterns.filter(area =>
query.toLowerCase().includes(area.toLowerCase())
);
}
extractPropertyTypes(query) {
return this.propertyTypes.filter(type =>
query.toLowerCase().includes(type.toLowerCase())
);
}
extractRoomTypes(query) {
return this.roomTypes.filter(room =>
query.toLowerCase().includes(room.toLowerCase())
);
}
classifyIntent(query) {
const queryLower = query.toLowerCase();
for (const [intent, keywords] of Object.entries(this.intentKeywords)) {
if (keywords.some(keyword => queryLower.includes(keyword))) {
return intent;
}
}
return 'unknown';
}
calculateStartDate(period, count) {
const now = moment();
switch (period) {
case 'months': return now.subtract(count, 'months').format('YYYY-MM-DD');
case 'weeks': return now.subtract(count, 'weeks').format('YYYY-MM-DD');
case 'years': return now.subtract(count, 'years').format('YYYY-MM-DD');
default: return now.subtract(6, 'months').format('YYYY-MM-DD');
}
}
}
```
#### Option 2: Compromise.js Implementation
```javascript
// Install Compromise.js
// npm install compromise
const nlp = require('compromise');
class CompromiseRealEstateParser {
constructor() {
this.areaMapping = {
'business bay': 'Business Bay',
'downtown': 'Downtown Dubai',
'marina': 'Dubai Marina',
'jbr': 'JBR',
'difc': 'DIFC'
};
}
async parseQuery(query) {
const doc = nlp(query);
return {
time_period: this.extractTimePeriod(doc),
areas: this.extractAreas(doc),
property_types: this.extractPropertyTypes(doc),
intent: this.classifyIntent(doc),
entities: doc.out('array')
};
}
extractTimePeriod(doc) {
const dates = doc.dates();
const timeExpressions = doc.match('#Value+ (months?|weeks?|years?)');
if (dates.length > 0) {
return { type: 'specific_date', value: dates[0].text() };
}
if (timeExpressions.length > 0) {
const match = timeExpressions.text().match(/(\d+)\s*(months?|weeks?|years?)/i);
if (match) {
return {
type: match[2].replace('s', ''),
count: parseInt(match[1])
};
}
}
return null;
}
extractAreas(doc) {
const places = doc.places();
const areas = [];
places.forEach(place => {
const placeText = place.text().toLowerCase();
if (this.areaMapping[placeText]) {
areas.push(this.areaMapping[placeText]);
}
});
return areas;
}
extractPropertyTypes(doc) {
const propertyKeywords = ['apartment', 'villa', 'commercial', 'office', 'retail'];
return propertyKeywords.filter(keyword =>
doc.has(keyword)
);
}
classifyIntent(doc) {
if (doc.has('trend') || doc.has('over time')) return 'trend';
if (doc.has('top') || doc.has('best')) return 'compare';
if (doc.has('average') || doc.has('avg')) return 'average';
if (doc.has('summary') || doc.has('brief')) return 'summary';
return 'unknown';
}
}
```
#### Option 3: Wink NLP Implementation
```javascript
// Install Wink NLP
// npm install wink-nlp
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const nlp = winkNLP(model);
class WinkRealEstateParser {
constructor() {
this.customEntities = {
AREA: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'],
PROPERTY_TYPE: ['apartment', 'villa', 'commercial', 'office'],
ROOM_TYPE: ['studio', '1BHK', '2BHK', '3BHK', '4BHK']
};
}
async parseQuery(query) {
const doc = nlp.readDoc(query);
return {
time_period: this.extractTimePeriod(doc),
areas: this.extractCustomEntities(doc, 'AREA'),
property_types: this.extractCustomEntities(doc, 'PROPERTY_TYPE'),
room_types: this.extractCustomEntities(doc, 'ROOM_TYPE'),
intent: this.classifyIntent(doc),
entities: doc.entities()
};
}
extractTimePeriod(doc) {
const dates = doc.entities().filter(e => e.type() === 'DATE');
return dates.length > 0 ? dates[0].value() : null;
}
extractCustomEntities(doc, entityType) {
const entities = [];
this.customEntities[entityType].forEach(entity => {
if (doc.has(entity)) {
entities.push(entity);
}
});
return entities;
}
classifyIntent(doc) {
const intentPatterns = {
trend: ['trend', 'over time', 'change'],
compare: ['top', 'compare', 'best'],
average: ['average', 'avg', 'mean'],
summary: ['summary', 'brief', 'overview']
};
for (const [intent, keywords] of Object.entries(intentPatterns)) {
if (keywords.some(keyword => doc.has(keyword))) {
return intent;
}
}
return 'unknown';
}
}
```
#### Context Management for Follow-up Queries
```javascript
class QueryContextManager {
constructor() {
this.contexts = new Map(); // sessionId -> context
}
storeContext(sessionId, parsedQuery) {
this.contexts.set(sessionId, {
...parsedQuery,
timestamp: Date.now(),
queryHistory: []
});
}
getContext(sessionId) {
return this.contexts.get(sessionId);
}
updateContext(sessionId, newQuery, isFollowUp = false) {
const existingContext = this.getContext(sessionId);
if (isFollowUp && existingContext) {
// Merge with existing context
const updatedContext = {
...existingContext,
...newQuery,
refinements: this.extractRefinements(newQuery),
timestamp: Date.now()
};
this.contexts.set(sessionId, updatedContext);
return updatedContext;
}
// Store new context
this.storeContext(sessionId, newQuery);
return newQuery;
}
extractRefinements(query) {
const refinements = [];
if (query.includes('by week')) refinements.push('weekly_grouping');
if (query.includes('by month')) refinements.push('monthly_grouping');
if (query.includes('only')) refinements.push('filter_specific');
if (query.includes('apartments only')) refinements.push('property_type_filter');
return refinements;
}
isFollowUpQuery(query) {
const followUpIndicators = [
'by week', 'by month', 'only', 'filter', 'show',
'apartments only', 'villas only', 'commercial only'
];
return followUpIndicators.some(indicator =>
query.toLowerCase().includes(indicator)
);
}
}
```
### 5. Frontend Integration Support
#### Card Components
```javascript
{
cards: [
{
title: "Average Rental Price",
value: "92,500 AED",
subtitle: "Business Bay - Last 6 months",
trend: "+15.3%",
icon: "trending-up"
}
]
}
```
#### Visualization Decision Logic
- Single value results → Display as cards
- Time series data → Line charts
- Category comparisons → Bar charts
- Distribution data → Pie/doughnut charts
- Multiple metrics → Combination of cards and charts
## Development Guidelines
### 1. Error Handling
- Implement comprehensive error handling for SQL queries
- Provide meaningful error messages for invalid queries
- Handle database connection issues gracefully
### 2. Performance Optimization
- Use database indexes effectively
- Implement query result caching for common queries
- Optimize SQL queries for large datasets
### 3. Security
- Implement SQL injection prevention
- Validate and sanitize user inputs
- Use parameterized queries
### 4. Testing
- Unit tests for query parsing logic
- Integration tests for SQL generation
- API endpoint testing with sample queries
## Sample Implementation Flow
1. **Receive Query**: `"Give me the last 6 months rental price trend for Business Bay"`
2. **Process with Node.js NLP**:
```javascript
const parser = new DubaiRealEstateParser();
const parsed = await parser.parseQuery("Give me the last 6 months rental price trend for Business Bay");
// Natural.js extracts:
// - Time period: { type: 'months', count: 6, startDate: '2024-06-01' }
// - Areas: ['Business Bay']
// - Intent: 'trend'
// - Property types: []
```
3. **Parse Query** (using Node.js NLP output):
- Time period: Last 6 months (from time extraction)
- Area: Business Bay (from area patterns)
- Property type: All (not specified)
- Analysis type: Trend (from intent classification)
4. **Generate SQL**:
```sql
SELECT
DATE_FORMAT(start_date, '%Y-%m') as month,
AVG(annual_amount) as avg_price,
COUNT(*) as transaction_count
FROM rents
WHERE area_en = 'Business Bay'
AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
GROUP BY DATE_FORMAT(start_date, '%Y-%m')
ORDER BY month
```
5. **Format Response**:
- Create line chart data for trend visualization
- Generate summary cards with key metrics
- Provide descriptive text for the analysis
6. **Return JSON**: Structured response ready for Chart.js and frontend cards
## Environment Configuration
### Required Environment Variables
Create a `.env` file in the project root:
```env
# Server Configuration
PORT=3000
NODE_ENV=development
# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_NAME=dubai_dld
DB_USER=root
DB_PASSWORD=your_password
# NLP Configuration
NLP_LIBRARY=natural
NLP_MODEL=en_core_web_sm
# Redis Configuration (optional - for context management)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
# API Configuration
API_KEY=your_api_key_here
RATE_LIMIT=100
```
### Node.js NLP Service Configuration
```javascript
// config/nlpConfig.js
module.exports = {
library: process.env.NLP_LIBRARY || 'natural',
model: process.env.NLP_MODEL || 'en_core_web_sm',
customPatterns: require('./customPatterns'),
areaMapping: require('./areaMapping'),
// Query classification keywords
intentPatterns: {
trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'],
compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'],
average: ['average', 'avg', 'mean', 'typical', 'median'],
summary: ['brief', 'summary', 'summarize', 'overview', 'overall'],
count: ['how many', 'count', 'number of', 'total'],
filter: ['only', 'filter', 'show', 'exclude', 'include']
}
};
```
## Additional Features
### Query Refinement Support with Node.js NLP Context Tracking
- Use Node.js NLP entity recognition to maintain context between queries
- Track extracted entities (areas, property types, time periods) across conversations
- Implement context-aware parsing for follow-up queries
- Store and retrieve previous query context from session management
- Support progressive data drilling with contextual refinements
- Use similarity matching for related area name variations
### Export Capabilities
- Provide raw data export options
- Support multiple output formats (CSV, JSON)
- Include metadata about queries and data sources
### Analytics Dashboard
- Track query patterns and popular analyses
- Monitor API usage and performance metrics
- Provide insights into data trends
This prompt provides a comprehensive foundation for building a sophisticated Node.js application that bridges natural language queries with structured data visualization, specifically tailored for Dubai real estate analytics.