Re_Backend/docs/CPC-CDC.md
2026-04-17 19:58:45 +05:30

5.6 KiB
Raw Permalink Blame History

CPC-CSD module (re-workflow)

This module (formerly referred to as CPC-CDC in code comments) covers CPC/CSD document upload, OCR/extraction, validation against MSD payloads, audit history, dashboards, and Excel reports. It was consolidated from the standalone CPC-CSD app into this backend.

HTTP API

CPC-CSD-compatible URLs (same as CPC-CSD/server/src/routes/index.js + Postman CPC-CSD-Full-Flow): POST /api/upload, GET /api/documents/*, POST /api/v1/ocr/validate, POST /api/v1/ocr/validate-upload (field file), POST /api/v1/ocr/upload (field files, max 20), report downloads under /api/v1/ocr/report/.... Registered from src/routes/cpc-csd-compat.mount.ts before /api/v1; disable with CPC_LEGACY_COMPAT_ROUTES=false.

Namespaced API — canonical prefix /api/v1/cpc-csd; legacy alias /api/v1/cpc-cdc (src/routes/cpc-cdc.routes.ts) mounts the same handlers and auth.

Method Path (prefix /api or /api/v1/cpc-csd or legacy /api/v1/cpc-cdc) Purpose
POST /upload GCS-only: multipart field file{ gcsUrl } (compat: /api/upload)
POST /v1/ocr/validate JSON URL mode — returns 400 with legacy message (use validate-upload)
POST /v1/ocr/validate-upload Single file field file + claim_id / msd_payload / …
POST /v1/ocr/upload Bulk: field files (max 20) + metadata_queue or msd_payload / document_type
GET /documents/analytics Totals, pass rate, distribution, dailyVolume, topMismatchFields
GET /documents/history claimId query — attempts grouped
GET /documents/recent Paginated list; query: page, limit, search, status, type, sortBy, order
GET /documents/:id/file Authenticated file bytes for preview (browser cannot use gs:// directly)
GET /documents/:id Document + audit logs + field_results
PUT /documents/:id/status Manual status / corrected fields
DELETE /documents/:id Remove document row
GET /v1/ocr/report/:claimId/download Per-claim Excel
GET /v1/ocr/report/all/download Master Excel (supports search, status, type)

Compat paths are under /api/...; namespaced routes are /api/v1/cpc-csd/... with /api/v1/cpc-cdc/... as an alias (same path suffixes as in the tables second column).

Database

Sequelize models: CpcDocument (cpc_documents), CpcAuditLog (cpc_audit_logs). Migration: src/migrations/2026041300-create-cpc-cdc-tables.ts.

Admin viewer list is stored under admin_configurations.config_key = CPC_CSD_ADMIN_CONFIG (migration 20260416120000-rename-cpc-cdc-admin-config-key.ts renames the legacy CPC_CDC_ADMIN_CONFIG row when applied).

On application startup, ensureCpcCdcSchema() runs after DB connect (src/services/cpc-cdc/ensureCpcCdcSchema.ts) so CREATE TABLE IF NOT EXISTS applies if migrations were skipped; still run npm run migrate for a full schema history.

Notable columns on cpc_documents: booking_id, claim_id, attempt_no, document_type, document_gcp_url, provider, JSONB msd_payload, extracted_fields, field_confidence, validation_status, match_percentage, mismatch_reasons, field_results, ip_address.

Unique index: (claim_id, attempt_no, document_type) — important when migrating legacy data with duplicates.

Environment variables

Copy re-workflow-be/.env.example to .env and adjust. Typical keys (see CpcCdcController and src/services/cpc-cdc/*):

  • GCP_PROJECT_ID — GCP project for Vertex / optional Document AI.
  • VERTEX_AI_LOCATION — Vertex region (e.g. asia-south1).
  • DOC_AI_PROCESSOR_ID — Optional; when set and valid, Document AI OCR may run before Gemini.
  • GCP_LOCATION_DOC_AI — Document AI region (default us).
  • GCS — Bucket/credentials as required by CpcGcsService (service account via GOOGLE_APPLICATION_CREDENTIALS or workload identity).
  • CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AItrue: always allow saving after failed/missing Vertex. false: in production only, disallow degraded saves. Omitted in non-production: degraded saves are allowed so local CPC works without GCP; set to false in dev to force strict Vertex. Omitted in production: strict (Vertex required unless RULES provider).

Extraction behaviour (upload response):

  • extraction_source: vertex_gemini — Fields came from the Vertex Gemini API (document bytes + optional Document AI OCR text).
  • extraction_source: rules_engine — Provider was RULES; fields come from CpcRuleExtractService on OCR text only (no Gemini).
  • extraction_source: degraded_empty — Extraction was skipped, failed, or (in non-production) hit a Vertex auth / ADC problem; the row is still stored with empty extracted_fields so you can test DB/history. In production this only happens when CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI=true or missing GCP_PROJECT_ID with degraded policy.

One-off data migration from legacy Prisma DB

If you still have the old Document / AuditLog tables (CPC-CSD Prisma schema) in PostgreSQL, run:

npm run migrate:cpc-csd

Optional CPC_CSD_DATABASE_URL: if set, rows are read from that database and written to the database in DATABASE_URL (re-workflow). If unset, both read and write use DATABASE_URL (same cluster; both table sets must exist).

After migration, spot-check history, document detail, and Excel downloads, then decommission the legacy app.