# Node.js Application Prompt: Dubai Land Department Analytics API ## Project Overview Create a Node.js application that provides a REST API endpoint to process natural language queries about Dubai real estate data and return structured responses optimized for Chart.js visualization and frontend card displays. ## Technical Requirements ### Architecture - **Clean Architecture**: Implement layered architecture with clear separation of concerns - **Database**: MySQL with the provided Dubai Land Department schema - **Response Format**: JSON responses compatible with Chart.js - **Framework**: Express.js with TypeScript support - **Database ORM**: Prisma or Sequelize for MySQL integration ### Core Functionality #### 1. Natural Language Processing with Node.js NLP Libraries - **Primary NLP Library**: Natural.js or Compromise.js for natural language understanding - **Alternative Options**: - **Natural.js**: Comprehensive NLP library with entity extraction, sentiment analysis, and classification - **Compromise.js**: Lightweight, fast NLP with excellent entity recognition - **Wink NLP**: Advanced NLP with custom entity recognition - **Node-nlp**: Microsoft's NLP library with intent recognition - Accept user queries in natural language - Parse queries using Node.js NLP libraries to identify: - Time periods (last 6 months, weekly, monthly) - Geographic areas (Business Bay, specific zones) - Property types (apartments, villas, commercial) - Aggregation types (trends, averages, summaries) - Transaction types (rental, sales, off-plan) - Implement custom Named Entity Recognition (NER) for: - Dubai area names - Property type classification - Time period extraction - Metric type identification - Use dependency parsing and tokenization for query structure analysis #### 2. Query Processing Logic The application should handle these specific query patterns: **Single Value Queries** (Card Display): - Calculate single metrics (averages, totals, counts) - Return both descriptive text and executable SQL query - Format results for frontend card components **Multi-Query Scenarios** (Chart Visualization): - Process complex queries requiring multiple data points - Generate multiple SQL queries for comprehensive analysis - Format results for Chart.js (line charts, bar charts, pie charts) #### 3. Supported Query Types **Rental Price Analysis:** - "Give me the last 6 months rental price trend for Business Bay" - "Summarise by week" (refinement of previous query) - "Apartments only" (further refinement) **Project Analysis:** - "Brief about the Project" (transaction summary) - "List of fast moving projects in last 6 months" - "Which area is seeing uptick in off-plan projects in last 6 months" **Area Performance:** - "Which area is having more rental transactions?" - "Top 5 areas for Commercial leasing and why?" - "Top 5 areas for Residential leasing and why?" - "Avg price of 3BHK apartment by area in last 6 months, group it by month. Show top 5 areas only." ## Database Schema Context The application will work with the Dubai Land Department MySQL database containing: ### Core Tables: - **transactions**: Real estate transaction records - **rents**: Property rental contracts (Ejari system) - **projects**: Development projects with developer relationships - **developers**: Registered real estate developers - **buildings**: Building registry information - **lands**: Land registry information - **valuations**: Official property valuations - **brokers**: Registered real estate brokers ### Key Fields for Analysis: - **Time Fields**: `instance_date`, `registration_date`, `start_date`, `end_date` - **Location Fields**: `area_en`, `zone_en`, `nearest_metro_en` - **Property Fields**: `prop_type_en`, `prop_sub_type_en`, `rooms_en` - **Financial Fields**: `trans_value`, `contract_amount`, `annual_amount` - **Project Fields**: `project_en`, `master_project_en`, `is_offplan_en` ## API Specification ### Endpoint Structure ``` POST /api/query Content-Type: application/json Request Body: { "query": "Give me the last 6 months rental price trend for Business Bay" } Response Format: { "success": true, "data": { "text": "Rental price trend for Business Bay over the last 6 months", "visualizations": [ { "type": "line", "title": "Monthly Rental Price Trend", "data": { "labels": ["2024-01", "2024-02", "2024-03", "2024-04", "2024-05", "2024-06"], "datasets": [{ "label": "Average Rental Price (AED)", "data": [85000, 87000, 89000, 92000, 95000, 98000], "borderColor": "rgb(75, 192, 192)", "backgroundColor": "rgba(75, 192, 192, 0.2)" }] } } ], "cards": [ { "title": "Average Price", "value": "92,500 AED", "subtitle": "Last 6 months", "trend": "+15.3%" } ], "sql_queries": [ "SELECT DATE_FORMAT(start_date, '%Y-%m') as month, AVG(annual_amount) as avg_price FROM rents WHERE area_en = 'Business Bay' AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH) GROUP BY DATE_FORMAT(start_date, '%Y-%m') ORDER BY month" ] } } ``` ## Implementation Requirements ### 1. Project Structure ``` src/ ├── controllers/ │ └── queryController.js ├── services/ │ ├── nlpService.js # spaCy integration and NLP processing │ ├── queryParser.js # Query parsing logic │ ├── sqlGenerator.js # SQL query generation │ ├── chartFormatter.js # Chart.js data formatting │ └── contextManager.js # Context tracking for follow-up queries ├── models/ │ └── database.js ├── middleware/ │ └── validation.js ├── utils/ │ ├── dateUtils.js │ └── textProcessor.js ├── config/ │ ├── nlpConfig.js # spaCy configuration and patterns │ └── areaMapping.js # Dubai area name mappings └── routes/ └── api.js ``` ### 1.1. Required Dependencies **Node.js Dependencies:** ```json { "dependencies": { "express": "^4.18.2", "mysql2": "^3.6.0", "natural": "^6.5.0", "compromise": "^14.10.0", "wink-nlp": "^1.12.0", "node-nlp": "^4.27.0", "moment": "^2.29.4", "dotenv": "^16.3.1", "joi": "^17.11.0", "redis": "^4.6.0" }, "devDependencies": { "@types/node": "^20.10.0", "typescript": "^5.3.3", "ts-node": "^10.9.1", "nodemon": "^3.0.2" } } ``` **No Python Dependencies Required** - Pure Node.js implementation ### 2. Key Components #### Query Parser Service (Node.js NLP-based) - Load Natural.js or Compromise.js NLP library - Extract time periods using custom regex patterns and date parsing - Identify geographic areas using custom entity recognition - Extract property types using keyword matching and classification - Determine aggregation requirements using intent recognition - Handle query refinements and follow-ups with context tracking - Implement custom matcher patterns for Dubai-specific terminology - Use tokenization and stemming for query normalization #### SQL Generator Service - Convert parsed queries to MySQL statements - Handle date range calculations - Implement proper JOIN operations for related tables - Optimize queries for performance #### Chart Formatter Service - Convert SQL results to Chart.js compatible format - Support multiple chart types (line, bar, pie, doughnut) - Generate appropriate labels and datasets - Handle data aggregation and grouping ### 3. Chart.js Compatibility #### Line Charts (Trends) ```javascript { type: 'line', data: { labels: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], datasets: [{ label: 'Rental Price Trend', data: [85000, 87000, 89000, 92000, 95000, 98000], borderColor: 'rgb(75, 192, 192)', backgroundColor: 'rgba(75, 192, 192, 0.2)' }] } } ``` #### Bar Charts (Comparisons) ```javascript { type: 'bar', data: { labels: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'], datasets: [{ label: 'Average Rental Price', data: [92000, 85000, 78000, 95000, 110000], backgroundColor: 'rgba(54, 162, 235, 0.6)' }] } } ``` #### Pie Charts (Distribution) ```javascript { type: 'pie', data: { labels: ['Apartments', 'Villas', 'Commercial', 'Off-plan'], datasets: [{ data: [45, 25, 20, 10], backgroundColor: ['#FF6384', '#36A2EB', '#FFCE56', '#4BC0C0'] }] } } ``` ### 4. Node.js NLP Implementation Details #### Option 1: Natural.js Implementation ```javascript // Install Natural.js // npm install natural const natural = require('natural'); const moment = require('moment'); // Initialize NLP components const tokenizer = new natural.WordTokenizer(); const stemmer = natural.PorterStemmer; // Custom entity recognition for Dubai real estate class DubaiRealEstateParser { constructor() { this.areaPatterns = [ 'Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC', 'JLT', 'Palm Jumeirah', 'Dubai Hills', 'Arabian Ranches', 'Jumeirah' ]; this.propertyTypes = [ 'apartment', 'villa', 'commercial', 'office', 'retail', 'warehouse' ]; this.roomTypes = ['studio', '1BHK', '2BHK', '3BHK', '4BHK', '5BHK']; this.intentKeywords = { trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'], compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'], average: ['average', 'avg', 'mean', 'typical', 'median'], summary: ['brief', 'summary', 'summarize', 'overview', 'overall'], count: ['how many', 'count', 'number of', 'total'] }; } async parseQuery(query) { const tokens = tokenizer.tokenize(query.toLowerCase()); return { time_period: this.extractTimePeriod(query), areas: this.extractAreas(query), property_types: this.extractPropertyTypes(query), room_types: this.extractRoomTypes(query), intent: this.classifyIntent(query), tokens: tokens }; } extractTimePeriod(query) { const timePatterns = [ { pattern: /last (\d+) months?/i, value: 'months', count: 6 }, { pattern: /last (\d+) weeks?/i, value: 'weeks', count: 4 }, { pattern: /last (\d+) years?/i, value: 'years', count: 1 }, { pattern: /(\d{4})/i, value: 'year', count: null } ]; for (const timePattern of timePatterns) { const match = query.match(timePattern.pattern); if (match) { return { type: timePattern.value, count: match[1] ? parseInt(match[1]) : timePattern.count, startDate: this.calculateStartDate(timePattern.value, match[1] || timePattern.count) }; } } return null; } extractAreas(query) { return this.areaPatterns.filter(area => query.toLowerCase().includes(area.toLowerCase()) ); } extractPropertyTypes(query) { return this.propertyTypes.filter(type => query.toLowerCase().includes(type.toLowerCase()) ); } extractRoomTypes(query) { return this.roomTypes.filter(room => query.toLowerCase().includes(room.toLowerCase()) ); } classifyIntent(query) { const queryLower = query.toLowerCase(); for (const [intent, keywords] of Object.entries(this.intentKeywords)) { if (keywords.some(keyword => queryLower.includes(keyword))) { return intent; } } return 'unknown'; } calculateStartDate(period, count) { const now = moment(); switch (period) { case 'months': return now.subtract(count, 'months').format('YYYY-MM-DD'); case 'weeks': return now.subtract(count, 'weeks').format('YYYY-MM-DD'); case 'years': return now.subtract(count, 'years').format('YYYY-MM-DD'); default: return now.subtract(6, 'months').format('YYYY-MM-DD'); } } } ``` #### Option 2: Compromise.js Implementation ```javascript // Install Compromise.js // npm install compromise const nlp = require('compromise'); class CompromiseRealEstateParser { constructor() { this.areaMapping = { 'business bay': 'Business Bay', 'downtown': 'Downtown Dubai', 'marina': 'Dubai Marina', 'jbr': 'JBR', 'difc': 'DIFC' }; } async parseQuery(query) { const doc = nlp(query); return { time_period: this.extractTimePeriod(doc), areas: this.extractAreas(doc), property_types: this.extractPropertyTypes(doc), intent: this.classifyIntent(doc), entities: doc.out('array') }; } extractTimePeriod(doc) { const dates = doc.dates(); const timeExpressions = doc.match('#Value+ (months?|weeks?|years?)'); if (dates.length > 0) { return { type: 'specific_date', value: dates[0].text() }; } if (timeExpressions.length > 0) { const match = timeExpressions.text().match(/(\d+)\s*(months?|weeks?|years?)/i); if (match) { return { type: match[2].replace('s', ''), count: parseInt(match[1]) }; } } return null; } extractAreas(doc) { const places = doc.places(); const areas = []; places.forEach(place => { const placeText = place.text().toLowerCase(); if (this.areaMapping[placeText]) { areas.push(this.areaMapping[placeText]); } }); return areas; } extractPropertyTypes(doc) { const propertyKeywords = ['apartment', 'villa', 'commercial', 'office', 'retail']; return propertyKeywords.filter(keyword => doc.has(keyword) ); } classifyIntent(doc) { if (doc.has('trend') || doc.has('over time')) return 'trend'; if (doc.has('top') || doc.has('best')) return 'compare'; if (doc.has('average') || doc.has('avg')) return 'average'; if (doc.has('summary') || doc.has('brief')) return 'summary'; return 'unknown'; } } ``` #### Option 3: Wink NLP Implementation ```javascript // Install Wink NLP // npm install wink-nlp const winkNLP = require('wink-nlp'); const model = require('wink-eng-lite-web-model'); const nlp = winkNLP(model); class WinkRealEstateParser { constructor() { this.customEntities = { AREA: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'], PROPERTY_TYPE: ['apartment', 'villa', 'commercial', 'office'], ROOM_TYPE: ['studio', '1BHK', '2BHK', '3BHK', '4BHK'] }; } async parseQuery(query) { const doc = nlp.readDoc(query); return { time_period: this.extractTimePeriod(doc), areas: this.extractCustomEntities(doc, 'AREA'), property_types: this.extractCustomEntities(doc, 'PROPERTY_TYPE'), room_types: this.extractCustomEntities(doc, 'ROOM_TYPE'), intent: this.classifyIntent(doc), entities: doc.entities() }; } extractTimePeriod(doc) { const dates = doc.entities().filter(e => e.type() === 'DATE'); return dates.length > 0 ? dates[0].value() : null; } extractCustomEntities(doc, entityType) { const entities = []; this.customEntities[entityType].forEach(entity => { if (doc.has(entity)) { entities.push(entity); } }); return entities; } classifyIntent(doc) { const intentPatterns = { trend: ['trend', 'over time', 'change'], compare: ['top', 'compare', 'best'], average: ['average', 'avg', 'mean'], summary: ['summary', 'brief', 'overview'] }; for (const [intent, keywords] of Object.entries(intentPatterns)) { if (keywords.some(keyword => doc.has(keyword))) { return intent; } } return 'unknown'; } } ``` #### Context Management for Follow-up Queries ```javascript class QueryContextManager { constructor() { this.contexts = new Map(); // sessionId -> context } storeContext(sessionId, parsedQuery) { this.contexts.set(sessionId, { ...parsedQuery, timestamp: Date.now(), queryHistory: [] }); } getContext(sessionId) { return this.contexts.get(sessionId); } updateContext(sessionId, newQuery, isFollowUp = false) { const existingContext = this.getContext(sessionId); if (isFollowUp && existingContext) { // Merge with existing context const updatedContext = { ...existingContext, ...newQuery, refinements: this.extractRefinements(newQuery), timestamp: Date.now() }; this.contexts.set(sessionId, updatedContext); return updatedContext; } // Store new context this.storeContext(sessionId, newQuery); return newQuery; } extractRefinements(query) { const refinements = []; if (query.includes('by week')) refinements.push('weekly_grouping'); if (query.includes('by month')) refinements.push('monthly_grouping'); if (query.includes('only')) refinements.push('filter_specific'); if (query.includes('apartments only')) refinements.push('property_type_filter'); return refinements; } isFollowUpQuery(query) { const followUpIndicators = [ 'by week', 'by month', 'only', 'filter', 'show', 'apartments only', 'villas only', 'commercial only' ]; return followUpIndicators.some(indicator => query.toLowerCase().includes(indicator) ); } } ``` ### 5. Frontend Integration Support #### Card Components ```javascript { cards: [ { title: "Average Rental Price", value: "92,500 AED", subtitle: "Business Bay - Last 6 months", trend: "+15.3%", icon: "trending-up" } ] } ``` #### Visualization Decision Logic - Single value results → Display as cards - Time series data → Line charts - Category comparisons → Bar charts - Distribution data → Pie/doughnut charts - Multiple metrics → Combination of cards and charts ## Development Guidelines ### 1. Error Handling - Implement comprehensive error handling for SQL queries - Provide meaningful error messages for invalid queries - Handle database connection issues gracefully ### 2. Performance Optimization - Use database indexes effectively - Implement query result caching for common queries - Optimize SQL queries for large datasets ### 3. Security - Implement SQL injection prevention - Validate and sanitize user inputs - Use parameterized queries ### 4. Testing - Unit tests for query parsing logic - Integration tests for SQL generation - API endpoint testing with sample queries ## Sample Implementation Flow 1. **Receive Query**: `"Give me the last 6 months rental price trend for Business Bay"` 2. **Process with Node.js NLP**: ```javascript const parser = new DubaiRealEstateParser(); const parsed = await parser.parseQuery("Give me the last 6 months rental price trend for Business Bay"); // Natural.js extracts: // - Time period: { type: 'months', count: 6, startDate: '2024-06-01' } // - Areas: ['Business Bay'] // - Intent: 'trend' // - Property types: [] ``` 3. **Parse Query** (using Node.js NLP output): - Time period: Last 6 months (from time extraction) - Area: Business Bay (from area patterns) - Property type: All (not specified) - Analysis type: Trend (from intent classification) 4. **Generate SQL**: ```sql SELECT DATE_FORMAT(start_date, '%Y-%m') as month, AVG(annual_amount) as avg_price, COUNT(*) as transaction_count FROM rents WHERE area_en = 'Business Bay' AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH) GROUP BY DATE_FORMAT(start_date, '%Y-%m') ORDER BY month ``` 5. **Format Response**: - Create line chart data for trend visualization - Generate summary cards with key metrics - Provide descriptive text for the analysis 6. **Return JSON**: Structured response ready for Chart.js and frontend cards ## Environment Configuration ### Required Environment Variables Create a `.env` file in the project root: ```env # Server Configuration PORT=3000 NODE_ENV=development # Database Configuration DB_HOST=localhost DB_PORT=3306 DB_NAME=dubai_dld DB_USER=root DB_PASSWORD=your_password # NLP Configuration NLP_LIBRARY=natural NLP_MODEL=en_core_web_sm # Redis Configuration (optional - for context management) REDIS_HOST=localhost REDIS_PORT=6379 REDIS_PASSWORD= # API Configuration API_KEY=your_api_key_here RATE_LIMIT=100 ``` ### Node.js NLP Service Configuration ```javascript // config/nlpConfig.js module.exports = { library: process.env.NLP_LIBRARY || 'natural', model: process.env.NLP_MODEL || 'en_core_web_sm', customPatterns: require('./customPatterns'), areaMapping: require('./areaMapping'), // Query classification keywords intentPatterns: { trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'], compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'], average: ['average', 'avg', 'mean', 'typical', 'median'], summary: ['brief', 'summary', 'summarize', 'overview', 'overall'], count: ['how many', 'count', 'number of', 'total'], filter: ['only', 'filter', 'show', 'exclude', 'include'] } }; ``` ## Additional Features ### Query Refinement Support with Node.js NLP Context Tracking - Use Node.js NLP entity recognition to maintain context between queries - Track extracted entities (areas, property types, time periods) across conversations - Implement context-aware parsing for follow-up queries - Store and retrieve previous query context from session management - Support progressive data drilling with contextual refinements - Use similarity matching for related area name variations ### Export Capabilities - Provide raw data export options - Support multiple output formats (CSV, JSON) - Include metadata about queries and data sources ### Analytics Dashboard - Track query patterns and popular analyses - Monitor API usage and performance metrics - Provide insights into data trends This prompt provides a comprehensive foundation for building a sophisticated Node.js application that bridges natural language queries with structured data visualization, specifically tailored for Dubai real estate analytics.