CPC-CSD module (re-workflow)

This module (formerly referred to as CPC-CDC in code comments) covers CPC/CSD document upload, OCR/extraction, validation against MSD payloads, audit history, dashboards, and Excel reports. It was consolidated from the standalone CPC-CSD app into this backend.

HTTP API

CPC-CSD-compatible URLs (same as CPC-CSD/server/src/routes/index.js + Postman CPC-CSD-Full-Flow): POST /api/upload, GET /api/documents/*, POST /api/v1/ocr/validate, POST /api/v1/ocr/validate-upload (field file), POST /api/v1/ocr/upload (field files, max 20), report downloads under /api/v1/ocr/report/.... Registered from src/routes/cpc-csd-compat.mount.ts before /api/v1; disable with CPC_LEGACY_COMPAT_ROUTES=false.

Namespaced API — canonical prefix /api/v1/cpc-csd; legacy alias /api/v1/cpc-cdc (src/routes/cpc-cdc.routes.ts) mounts the same handlers and auth.

Method	Path (prefix `/api` or `/api/v1/cpc-csd` or legacy `/api/v1/cpc-cdc`)	Purpose
POST	`/upload`	GCS-only: multipart field `file` → `{ gcsUrl }` (compat: `/api/upload`)
POST	`/v1/ocr/validate`	JSON URL mode — returns 400 with legacy message (use validate-upload)
POST	`/v1/ocr/validate-upload`	Single file field `file` + `claim_id` / `msd_payload` / …
POST	`/v1/ocr/upload`	Bulk: field `files` (max 20) + `metadata_queue` or `msd_payload` / `document_type`
GET	`/documents/analytics`	Totals, pass rate, distribution, `dailyVolume`, `topMismatchFields`
GET	`/documents/history`	`claimId` query — attempts grouped
GET	`/documents/recent`	Paginated list; query: `page`, `limit`, `search`, `status`, `type`, `sortBy`, `order`
GET	`/documents/:id/file`	Authenticated file bytes for preview (browser cannot use `gs://` directly)
GET	`/documents/:id`	Document + audit logs + `field_results`
PUT	`/documents/:id/status`	Manual status / corrected fields
DELETE	`/documents/:id`	Remove document row
GET	`/v1/ocr/report/:claimId/download`	Per-claim Excel
GET	`/v1/ocr/report/all/download`	Master Excel (supports `search`, `status`, `type`)

Compat paths are under /api/...; namespaced routes are /api/v1/cpc-csd/... with /api/v1/cpc-cdc/... as an alias (same path suffixes as in the table’s second column).

Database

Sequelize models: CpcDocument (cpc_documents), CpcAuditLog (cpc_audit_logs). Migration: src/migrations/2026041300-create-cpc-cdc-tables.ts.

Admin viewer list is stored under admin_configurations.config_key = CPC_CSD_ADMIN_CONFIG (migration 20260416120000-rename-cpc-cdc-admin-config-key.ts renames the legacy CPC_CDC_ADMIN_CONFIG row when applied).

On application startup, ensureCpcCdcSchema() runs after DB connect (src/services/cpc-cdc/ensureCpcCdcSchema.ts) so CREATE TABLE IF NOT EXISTS applies if migrations were skipped; still run npm run migrate for a full schema history.

Notable columns on cpc_documents: booking_id, claim_id, attempt_no, document_type, document_gcp_url, provider, JSONB msd_payload, extracted_fields, field_confidence, validation_status, match_percentage, mismatch_reasons, field_results, ip_address.

Unique index: (claim_id, attempt_no, document_type) — important when migrating legacy data with duplicates.

Environment variables

Copy re-workflow-be/.env.example to .env and adjust. Typical keys (see CpcCdcController and src/services/cpc-cdc/*):

GCP_PROJECT_ID — GCP project for Vertex / optional Document AI.
VERTEX_AI_LOCATION — Vertex region (e.g. asia-south1).
DOC_AI_PROCESSOR_ID — Optional; when set and valid, Document AI OCR may run before Gemini.
GCP_LOCATION_DOC_AI — Document AI region (default us).
GCS — Bucket/credentials as required by CpcGcsService (service account via GOOGLE_APPLICATION_CREDENTIALS or workload identity).
CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI — true: always allow saving after failed/missing Vertex. false: in production only, disallow degraded saves. Omitted in non-production: degraded saves are allowed so local CPC works without GCP; set to false in dev to force strict Vertex. Omitted in production: strict (Vertex required unless RULES provider).

Extraction behaviour (upload response):

extraction_source: vertex_gemini — Fields came from the Vertex Gemini API (document bytes + optional Document AI OCR text).
extraction_source: rules_engine — Provider was RULES; fields come from CpcRuleExtractService on OCR text only (no Gemini).
extraction_source: degraded_empty — Extraction was skipped, failed, or (in non-production) hit a Vertex auth / ADC problem; the row is still stored with empty extracted_fields so you can test DB/history. In production this only happens when CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI=true or missing GCP_PROJECT_ID with degraded policy.

One-off data migration from legacy Prisma DB

If you still have the old Document / AuditLog tables (CPC-CSD Prisma schema) in PostgreSQL, run:

npm run migrate:cpc-csd

Optional CPC_CSD_DATABASE_URL: if set, rows are read from that database and written to the database in DATABASE_URL (re-workflow). If unset, both read and write use DATABASE_URL (same cluster; both table sets must exist).

After migration, spot-check history, document detail, and Excel downloads, then decommission the legacy app.

5.6 KiB Raw Permalink Blame History Unescape Escape

CPC-CSD module (re-workflow)

HTTP API

Database

Environment variables

One-off data migration from legacy Prisma DB

5.6 KiB

Raw Permalink Blame History