dld_backend/nodejs_chartjs_prompt.md
2025-10-30 12:13:02 +05:30

22 KiB

Node.js Application Prompt: Dubai Land Department Analytics API

Project Overview

Create a Node.js application that provides a REST API endpoint to process natural language queries about Dubai real estate data and return structured responses optimized for Chart.js visualization and frontend card displays.

Technical Requirements

Architecture

  • Clean Architecture: Implement layered architecture with clear separation of concerns
  • Database: MySQL with the provided Dubai Land Department schema
  • Response Format: JSON responses compatible with Chart.js
  • Framework: Express.js with TypeScript support
  • Database ORM: Prisma or Sequelize for MySQL integration

Core Functionality

1. Natural Language Processing with Node.js NLP Libraries

  • Primary NLP Library: Natural.js or Compromise.js for natural language understanding
  • Alternative Options:
    • Natural.js: Comprehensive NLP library with entity extraction, sentiment analysis, and classification
    • Compromise.js: Lightweight, fast NLP with excellent entity recognition
    • Wink NLP: Advanced NLP with custom entity recognition
    • Node-nlp: Microsoft's NLP library with intent recognition
  • Accept user queries in natural language
  • Parse queries using Node.js NLP libraries to identify:
    • Time periods (last 6 months, weekly, monthly)
    • Geographic areas (Business Bay, specific zones)
    • Property types (apartments, villas, commercial)
    • Aggregation types (trends, averages, summaries)
    • Transaction types (rental, sales, off-plan)
  • Implement custom Named Entity Recognition (NER) for:
    • Dubai area names
    • Property type classification
    • Time period extraction
    • Metric type identification
  • Use dependency parsing and tokenization for query structure analysis

2. Query Processing Logic

The application should handle these specific query patterns:

Single Value Queries (Card Display):

  • Calculate single metrics (averages, totals, counts)
  • Return both descriptive text and executable SQL query
  • Format results for frontend card components

Multi-Query Scenarios (Chart Visualization):

  • Process complex queries requiring multiple data points
  • Generate multiple SQL queries for comprehensive analysis
  • Format results for Chart.js (line charts, bar charts, pie charts)

3. Supported Query Types

Rental Price Analysis:

  • "Give me the last 6 months rental price trend for Business Bay"
  • "Summarise by week" (refinement of previous query)
  • "Apartments only" (further refinement)

Project Analysis:

  • "Brief about the Project" (transaction summary)
  • "List of fast moving projects in last 6 months"
  • "Which area is seeing uptick in off-plan projects in last 6 months"

Area Performance:

  • "Which area is having more rental transactions?"
  • "Top 5 areas for Commercial leasing and why?"
  • "Top 5 areas for Residential leasing and why?"
  • "Avg price of 3BHK apartment by area in last 6 months, group it by month. Show top 5 areas only."

Database Schema Context

The application will work with the Dubai Land Department MySQL database containing:

Core Tables:

  • transactions: Real estate transaction records
  • rents: Property rental contracts (Ejari system)
  • projects: Development projects with developer relationships
  • developers: Registered real estate developers
  • buildings: Building registry information
  • lands: Land registry information
  • valuations: Official property valuations
  • brokers: Registered real estate brokers

Key Fields for Analysis:

  • Time Fields: instance_date, registration_date, start_date, end_date
  • Location Fields: area_en, zone_en, nearest_metro_en
  • Property Fields: prop_type_en, prop_sub_type_en, rooms_en
  • Financial Fields: trans_value, contract_amount, annual_amount
  • Project Fields: project_en, master_project_en, is_offplan_en

API Specification

Endpoint Structure

POST /api/query
Content-Type: application/json

Request Body:
{
  "query": "Give me the last 6 months rental price trend for Business Bay"
}

Response Format:
{
  "success": true,
  "data": {
    "text": "Rental price trend for Business Bay over the last 6 months",
    "visualizations": [
      {
        "type": "line",
        "title": "Monthly Rental Price Trend",
        "data": {
          "labels": ["2024-01", "2024-02", "2024-03", "2024-04", "2024-05", "2024-06"],
          "datasets": [{
            "label": "Average Rental Price (AED)",
            "data": [85000, 87000, 89000, 92000, 95000, 98000],
            "borderColor": "rgb(75, 192, 192)",
            "backgroundColor": "rgba(75, 192, 192, 0.2)"
          }]
        }
      }
    ],
    "cards": [
      {
        "title": "Average Price",
        "value": "92,500 AED",
        "subtitle": "Last 6 months",
        "trend": "+15.3%"
      }
    ],
    "sql_queries": [
      "SELECT DATE_FORMAT(start_date, '%Y-%m') as month, AVG(annual_amount) as avg_price FROM rents WHERE area_en = 'Business Bay' AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH) GROUP BY DATE_FORMAT(start_date, '%Y-%m') ORDER BY month"
    ]
  }
}

Implementation Requirements

1. Project Structure

src/
├── controllers/
│   └── queryController.js
├── services/
│   ├── nlpService.js          # spaCy integration and NLP processing
│   ├── queryParser.js         # Query parsing logic
│   ├── sqlGenerator.js        # SQL query generation
│   ├── chartFormatter.js      # Chart.js data formatting
│   └── contextManager.js      # Context tracking for follow-up queries
├── models/
│   └── database.js
├── middleware/
│   └── validation.js
├── utils/
│   ├── dateUtils.js
│   └── textProcessor.js
├── config/
│   ├── nlpConfig.js           # spaCy configuration and patterns
│   └── areaMapping.js         # Dubai area name mappings
└── routes/
    └── api.js

1.1. Required Dependencies

Node.js Dependencies:

{
  "dependencies": {
    "express": "^4.18.2",
    "mysql2": "^3.6.0",
    "natural": "^6.5.0",
    "compromise": "^14.10.0",
    "wink-nlp": "^1.12.0",
    "node-nlp": "^4.27.0",
    "moment": "^2.29.4",
    "dotenv": "^16.3.1",
    "joi": "^17.11.0",
    "redis": "^4.6.0"
  },
  "devDependencies": {
    "@types/node": "^20.10.0",
    "typescript": "^5.3.3",
    "ts-node": "^10.9.1",
    "nodemon": "^3.0.2"
  }
}

No Python Dependencies Required - Pure Node.js implementation

2. Key Components

Query Parser Service (Node.js NLP-based)

  • Load Natural.js or Compromise.js NLP library
  • Extract time periods using custom regex patterns and date parsing
  • Identify geographic areas using custom entity recognition
  • Extract property types using keyword matching and classification
  • Determine aggregation requirements using intent recognition
  • Handle query refinements and follow-ups with context tracking
  • Implement custom matcher patterns for Dubai-specific terminology
  • Use tokenization and stemming for query normalization

SQL Generator Service

  • Convert parsed queries to MySQL statements
  • Handle date range calculations
  • Implement proper JOIN operations for related tables
  • Optimize queries for performance

Chart Formatter Service

  • Convert SQL results to Chart.js compatible format
  • Support multiple chart types (line, bar, pie, doughnut)
  • Generate appropriate labels and datasets
  • Handle data aggregation and grouping

3. Chart.js Compatibility

{
  type: 'line',
  data: {
    labels: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    datasets: [{
      label: 'Rental Price Trend',
      data: [85000, 87000, 89000, 92000, 95000, 98000],
      borderColor: 'rgb(75, 192, 192)',
      backgroundColor: 'rgba(75, 192, 192, 0.2)'
    }]
  }
}

Bar Charts (Comparisons)

{
  type: 'bar',
  data: {
    labels: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'],
    datasets: [{
      label: 'Average Rental Price',
      data: [92000, 85000, 78000, 95000, 110000],
      backgroundColor: 'rgba(54, 162, 235, 0.6)'
    }]
  }
}

Pie Charts (Distribution)

{
  type: 'pie',
  data: {
    labels: ['Apartments', 'Villas', 'Commercial', 'Off-plan'],
    datasets: [{
      data: [45, 25, 20, 10],
      backgroundColor: ['#FF6384', '#36A2EB', '#FFCE56', '#4BC0C0']
    }]
  }
}

4. Node.js NLP Implementation Details

Option 1: Natural.js Implementation

// Install Natural.js
// npm install natural
const natural = require('natural');
const moment = require('moment');

// Initialize NLP components
const tokenizer = new natural.WordTokenizer();
const stemmer = natural.PorterStemmer;

// Custom entity recognition for Dubai real estate
class DubaiRealEstateParser {
  constructor() {
    this.areaPatterns = [
      'Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC', 'JLT', 
      'Palm Jumeirah', 'Dubai Hills', 'Arabian Ranches', 'Jumeirah'
    ];
    
    this.propertyTypes = [
      'apartment', 'villa', 'commercial', 'office', 'retail', 'warehouse'
    ];
    
    this.roomTypes = ['studio', '1BHK', '2BHK', '3BHK', '4BHK', '5BHK'];
    
    this.intentKeywords = {
      trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'],
      compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'],
      average: ['average', 'avg', 'mean', 'typical', 'median'],
      summary: ['brief', 'summary', 'summarize', 'overview', 'overall'],
      count: ['how many', 'count', 'number of', 'total']
    };
  }

  async parseQuery(query) {
    const tokens = tokenizer.tokenize(query.toLowerCase());
    
    return {
      time_period: this.extractTimePeriod(query),
      areas: this.extractAreas(query),
      property_types: this.extractPropertyTypes(query),
      room_types: this.extractRoomTypes(query),
      intent: this.classifyIntent(query),
      tokens: tokens
    };
  }

  extractTimePeriod(query) {
    const timePatterns = [
      { pattern: /last (\d+) months?/i, value: 'months', count: 6 },
      { pattern: /last (\d+) weeks?/i, value: 'weeks', count: 4 },
      { pattern: /last (\d+) years?/i, value: 'years', count: 1 },
      { pattern: /(\d{4})/i, value: 'year', count: null }
    ];

    for (const timePattern of timePatterns) {
      const match = query.match(timePattern.pattern);
      if (match) {
        return {
          type: timePattern.value,
          count: match[1] ? parseInt(match[1]) : timePattern.count,
          startDate: this.calculateStartDate(timePattern.value, match[1] || timePattern.count)
        };
      }
    }
    return null;
  }

  extractAreas(query) {
    return this.areaPatterns.filter(area => 
      query.toLowerCase().includes(area.toLowerCase())
    );
  }

  extractPropertyTypes(query) {
    return this.propertyTypes.filter(type => 
      query.toLowerCase().includes(type.toLowerCase())
    );
  }

  extractRoomTypes(query) {
    return this.roomTypes.filter(room => 
      query.toLowerCase().includes(room.toLowerCase())
    );
  }

  classifyIntent(query) {
    const queryLower = query.toLowerCase();
    
    for (const [intent, keywords] of Object.entries(this.intentKeywords)) {
      if (keywords.some(keyword => queryLower.includes(keyword))) {
        return intent;
      }
    }
    return 'unknown';
  }

  calculateStartDate(period, count) {
    const now = moment();
    switch (period) {
      case 'months': return now.subtract(count, 'months').format('YYYY-MM-DD');
      case 'weeks': return now.subtract(count, 'weeks').format('YYYY-MM-DD');
      case 'years': return now.subtract(count, 'years').format('YYYY-MM-DD');
      default: return now.subtract(6, 'months').format('YYYY-MM-DD');
    }
  }
}

Option 2: Compromise.js Implementation

// Install Compromise.js
// npm install compromise
const nlp = require('compromise');

class CompromiseRealEstateParser {
  constructor() {
    this.areaMapping = {
      'business bay': 'Business Bay',
      'downtown': 'Downtown Dubai',
      'marina': 'Dubai Marina',
      'jbr': 'JBR',
      'difc': 'DIFC'
    };
  }

  async parseQuery(query) {
    const doc = nlp(query);
    
    return {
      time_period: this.extractTimePeriod(doc),
      areas: this.extractAreas(doc),
      property_types: this.extractPropertyTypes(doc),
      intent: this.classifyIntent(doc),
      entities: doc.out('array')
    };
  }

  extractTimePeriod(doc) {
    const dates = doc.dates();
    const timeExpressions = doc.match('#Value+ (months?|weeks?|years?)');
    
    if (dates.length > 0) {
      return { type: 'specific_date', value: dates[0].text() };
    }
    
    if (timeExpressions.length > 0) {
      const match = timeExpressions.text().match(/(\d+)\s*(months?|weeks?|years?)/i);
      if (match) {
        return {
          type: match[2].replace('s', ''),
          count: parseInt(match[1])
        };
      }
    }
    
    return null;
  }

  extractAreas(doc) {
    const places = doc.places();
    const areas = [];
    
    places.forEach(place => {
      const placeText = place.text().toLowerCase();
      if (this.areaMapping[placeText]) {
        areas.push(this.areaMapping[placeText]);
      }
    });
    
    return areas;
  }

  extractPropertyTypes(doc) {
    const propertyKeywords = ['apartment', 'villa', 'commercial', 'office', 'retail'];
    return propertyKeywords.filter(keyword => 
      doc.has(keyword)
    );
  }

  classifyIntent(doc) {
    if (doc.has('trend') || doc.has('over time')) return 'trend';
    if (doc.has('top') || doc.has('best')) return 'compare';
    if (doc.has('average') || doc.has('avg')) return 'average';
    if (doc.has('summary') || doc.has('brief')) return 'summary';
    return 'unknown';
  }
}

Option 3: Wink NLP Implementation

// Install Wink NLP
// npm install wink-nlp
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');

const nlp = winkNLP(model);

class WinkRealEstateParser {
  constructor() {
    this.customEntities = {
      AREA: ['Business Bay', 'Downtown', 'Marina', 'JBR', 'DIFC'],
      PROPERTY_TYPE: ['apartment', 'villa', 'commercial', 'office'],
      ROOM_TYPE: ['studio', '1BHK', '2BHK', '3BHK', '4BHK']
    };
  }

  async parseQuery(query) {
    const doc = nlp.readDoc(query);
    
    return {
      time_period: this.extractTimePeriod(doc),
      areas: this.extractCustomEntities(doc, 'AREA'),
      property_types: this.extractCustomEntities(doc, 'PROPERTY_TYPE'),
      room_types: this.extractCustomEntities(doc, 'ROOM_TYPE'),
      intent: this.classifyIntent(doc),
      entities: doc.entities()
    };
  }

  extractTimePeriod(doc) {
    const dates = doc.entities().filter(e => e.type() === 'DATE');
    return dates.length > 0 ? dates[0].value() : null;
  }

  extractCustomEntities(doc, entityType) {
    const entities = [];
    this.customEntities[entityType].forEach(entity => {
      if (doc.has(entity)) {
        entities.push(entity);
      }
    });
    return entities;
  }

  classifyIntent(doc) {
    const intentPatterns = {
      trend: ['trend', 'over time', 'change'],
      compare: ['top', 'compare', 'best'],
      average: ['average', 'avg', 'mean'],
      summary: ['summary', 'brief', 'overview']
    };

    for (const [intent, keywords] of Object.entries(intentPatterns)) {
      if (keywords.some(keyword => doc.has(keyword))) {
        return intent;
      }
    }
    return 'unknown';
  }
}

Context Management for Follow-up Queries

class QueryContextManager {
  constructor() {
    this.contexts = new Map(); // sessionId -> context
  }

  storeContext(sessionId, parsedQuery) {
    this.contexts.set(sessionId, {
      ...parsedQuery,
      timestamp: Date.now(),
      queryHistory: []
    });
  }

  getContext(sessionId) {
    return this.contexts.get(sessionId);
  }

  updateContext(sessionId, newQuery, isFollowUp = false) {
    const existingContext = this.getContext(sessionId);
    
    if (isFollowUp && existingContext) {
      // Merge with existing context
      const updatedContext = {
        ...existingContext,
        ...newQuery,
        refinements: this.extractRefinements(newQuery),
        timestamp: Date.now()
      };
      
      this.contexts.set(sessionId, updatedContext);
      return updatedContext;
    }
    
    // Store new context
    this.storeContext(sessionId, newQuery);
    return newQuery;
  }

  extractRefinements(query) {
    const refinements = [];
    
    if (query.includes('by week')) refinements.push('weekly_grouping');
    if (query.includes('by month')) refinements.push('monthly_grouping');
    if (query.includes('only')) refinements.push('filter_specific');
    if (query.includes('apartments only')) refinements.push('property_type_filter');
    
    return refinements;
  }

  isFollowUpQuery(query) {
    const followUpIndicators = [
      'by week', 'by month', 'only', 'filter', 'show', 
      'apartments only', 'villas only', 'commercial only'
    ];
    
    return followUpIndicators.some(indicator => 
      query.toLowerCase().includes(indicator)
    );
  }
}

5. Frontend Integration Support

Card Components

{
  cards: [
    {
      title: "Average Rental Price",
      value: "92,500 AED",
      subtitle: "Business Bay - Last 6 months",
      trend: "+15.3%",
      icon: "trending-up"
    }
  ]
}

Visualization Decision Logic

  • Single value results → Display as cards
  • Time series data → Line charts
  • Category comparisons → Bar charts
  • Distribution data → Pie/doughnut charts
  • Multiple metrics → Combination of cards and charts

Development Guidelines

1. Error Handling

  • Implement comprehensive error handling for SQL queries
  • Provide meaningful error messages for invalid queries
  • Handle database connection issues gracefully

2. Performance Optimization

  • Use database indexes effectively
  • Implement query result caching for common queries
  • Optimize SQL queries for large datasets

3. Security

  • Implement SQL injection prevention
  • Validate and sanitize user inputs
  • Use parameterized queries

4. Testing

  • Unit tests for query parsing logic
  • Integration tests for SQL generation
  • API endpoint testing with sample queries

Sample Implementation Flow

  1. Receive Query: "Give me the last 6 months rental price trend for Business Bay"

  2. Process with Node.js NLP:

    const parser = new DubaiRealEstateParser();
    const parsed = await parser.parseQuery("Give me the last 6 months rental price trend for Business Bay");
    // Natural.js extracts:
    // - Time period: { type: 'months', count: 6, startDate: '2024-06-01' }
    // - Areas: ['Business Bay']
    // - Intent: 'trend'
    // - Property types: []
    
  3. Parse Query (using Node.js NLP output):

    • Time period: Last 6 months (from time extraction)
    • Area: Business Bay (from area patterns)
    • Property type: All (not specified)
    • Analysis type: Trend (from intent classification)
  4. Generate SQL:

    SELECT 
      DATE_FORMAT(start_date, '%Y-%m') as month,
      AVG(annual_amount) as avg_price,
      COUNT(*) as transaction_count
    FROM rents 
    WHERE area_en = 'Business Bay' 
      AND start_date >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
    GROUP BY DATE_FORMAT(start_date, '%Y-%m')
    ORDER BY month
    
  5. Format Response:

    • Create line chart data for trend visualization
    • Generate summary cards with key metrics
    • Provide descriptive text for the analysis
  6. Return JSON: Structured response ready for Chart.js and frontend cards

Environment Configuration

Required Environment Variables

Create a .env file in the project root:

# Server Configuration
PORT=3000
NODE_ENV=development

# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_NAME=dubai_dld
DB_USER=root
DB_PASSWORD=your_password

# NLP Configuration
NLP_LIBRARY=natural
NLP_MODEL=en_core_web_sm

# Redis Configuration (optional - for context management)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=

# API Configuration
API_KEY=your_api_key_here
RATE_LIMIT=100

Node.js NLP Service Configuration

// config/nlpConfig.js
module.exports = {
  library: process.env.NLP_LIBRARY || 'natural',
  model: process.env.NLP_MODEL || 'en_core_web_sm',
  customPatterns: require('./customPatterns'),
  areaMapping: require('./areaMapping'),
  
  // Query classification keywords
  intentPatterns: {
    trend: ['trend', 'over time', 'change', 'last', 'months', 'weekly', 'monthly'],
    compare: ['top', 'compare', 'versus', 'vs', 'better', 'best', 'highest', 'lowest'],
    average: ['average', 'avg', 'mean', 'typical', 'median'],
    summary: ['brief', 'summary', 'summarize', 'overview', 'overall'],
    count: ['how many', 'count', 'number of', 'total'],
    filter: ['only', 'filter', 'show', 'exclude', 'include']
  }
};

Additional Features

Query Refinement Support with Node.js NLP Context Tracking

  • Use Node.js NLP entity recognition to maintain context between queries
  • Track extracted entities (areas, property types, time periods) across conversations
  • Implement context-aware parsing for follow-up queries
  • Store and retrieve previous query context from session management
  • Support progressive data drilling with contextual refinements
  • Use similarity matching for related area name variations

Export Capabilities

  • Provide raw data export options
  • Support multiple output formats (CSV, JSON)
  • Include metadata about queries and data sources

Analytics Dashboard

  • Track query patterns and popular analyses
  • Monitor API usage and performance metrics
  • Provide insights into data trends

This prompt provides a comprehensive foundation for building a sophisticated Node.js application that bridges natural language queries with structured data visualization, specifically tailored for Dubai real estate analytics.