# CPC-CSD module (re-workflow) This module (formerly referred to as CPC-CDC in code comments) covers **CPC/CSD document upload, OCR/extraction, validation against MSD payloads, audit history, dashboards, and Excel reports**. It was consolidated from the standalone **CPC-CSD** app into this backend. ## HTTP API **CPC-CSD-compatible URLs** (same as `CPC-CSD/server/src/routes/index.js` + Postman `CPC-CSD-Full-Flow`): `POST /api/upload`, `GET /api/documents/*`, `POST /api/v1/ocr/validate`, `POST /api/v1/ocr/validate-upload` (field **`file`**), `POST /api/v1/ocr/upload` (field **`files`**, max 20), report downloads under `/api/v1/ocr/report/...`. Registered from `src/routes/cpc-csd-compat.mount.ts` before `/api/v1`; disable with **`CPC_LEGACY_COMPAT_ROUTES=false`**. **Namespaced API** — canonical prefix **`/api/v1/cpc-csd`**; legacy alias **`/api/v1/cpc-cdc`** (`src/routes/cpc-cdc.routes.ts`) mounts the same handlers and auth. | Method | Path (prefix **`/api`** or **`/api/v1/cpc-csd`** or legacy **`/api/v1/cpc-cdc`**) | Purpose | |--------|------|---------| | POST | `/upload` | GCS-only: multipart field **`file`** → `{ gcsUrl }` (compat: **`/api/upload`**) | | POST | `/v1/ocr/validate` | JSON URL mode — returns **400** with legacy message (use validate-upload) | | POST | `/v1/ocr/validate-upload` | Single file field **`file`** + `claim_id` / `msd_payload` / … | | POST | `/v1/ocr/upload` | Bulk: field **`files`** (max 20) + `metadata_queue` or `msd_payload` / `document_type` | | GET | `/documents/analytics` | Totals, pass rate, distribution, `dailyVolume`, `topMismatchFields` | | GET | `/documents/history` | `claimId` query — attempts grouped | | GET | `/documents/recent` | Paginated list; query: `page`, `limit`, `search`, `status`, `type`, `sortBy`, `order` | | GET | `/documents/:id/file` | Authenticated file bytes for preview (browser cannot use `gs://` directly) | | GET | `/documents/:id` | Document + audit logs + `field_results` | | PUT | `/documents/:id/status` | Manual status / corrected fields | | DELETE | `/documents/:id` | Remove document row | | GET | `/v1/ocr/report/:claimId/download` | Per-claim Excel | | GET | `/v1/ocr/report/all/download` | Master Excel (supports `search`, `status`, `type`) | Compat paths are under **`/api/...`**; namespaced routes are **`/api/v1/cpc-csd/...`** with **`/api/v1/cpc-cdc/...`** as an alias (same path suffixes as in the table’s second column). ## Database Sequelize models: **`CpcDocument`** (`cpc_documents`), **`CpcAuditLog`** (`cpc_audit_logs`). Migration: `src/migrations/2026041300-create-cpc-cdc-tables.ts`. **Admin viewer list** is stored under `admin_configurations.config_key = CPC_CSD_ADMIN_CONFIG` (migration `20260416120000-rename-cpc-cdc-admin-config-key.ts` renames the legacy `CPC_CDC_ADMIN_CONFIG` row when applied). On **application startup**, `ensureCpcCdcSchema()` runs after DB connect (`src/services/cpc-cdc/ensureCpcCdcSchema.ts`) so `CREATE TABLE IF NOT EXISTS` applies if migrations were skipped; still run `npm run migrate` for a full schema history. Notable columns on `cpc_documents`: `booking_id`, `claim_id`, `attempt_no`, `document_type`, `document_gcp_url`, `provider`, JSONB `msd_payload`, `extracted_fields`, `field_confidence`, `validation_status`, `match_percentage`, `mismatch_reasons`, `field_results`, `ip_address`. Unique index: `(claim_id, attempt_no, document_type)` — important when migrating legacy data with duplicates. ## Environment variables Copy **`re-workflow-be/.env.example`** to `.env` and adjust. Typical keys (see `CpcCdcController` and `src/services/cpc-cdc/*`): - **`GCP_PROJECT_ID`** — GCP project for Vertex / optional Document AI. - **`VERTEX_AI_LOCATION`** — Vertex region (e.g. `asia-south1`). - **`DOC_AI_PROCESSOR_ID`** — Optional; when set and valid, Document AI OCR may run before Gemini. - **`GCP_LOCATION_DOC_AI`** — Document AI region (default `us`). - **GCS** — Bucket/credentials as required by `CpcGcsService` (service account via `GOOGLE_APPLICATION_CREDENTIALS` or workload identity). - **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI`** — **`true`**: always allow saving after failed/missing Vertex. **`false`**: in **production** only, disallow degraded saves. **Omitted in non-production**: degraded saves are **allowed** so local CPC works without GCP; set to **`false`** in dev to force strict Vertex. **Omitted in production**: strict (Vertex required unless `RULES` provider). **Extraction behaviour (upload response):** - **`extraction_source`: `vertex_gemini`** — Fields came from the Vertex Gemini API (document bytes + optional Document AI OCR text). - **`extraction_source`: `rules_engine`** — Provider was **`RULES`**; fields come from `CpcRuleExtractService` on OCR text only (no Gemini). - **`extraction_source`: `degraded_empty`** — Extraction was skipped, failed, or (in **non-production**) hit a **Vertex auth / ADC** problem; the row is still stored with empty `extracted_fields` so you can test DB/history. In production this only happens when **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI=true`** or missing `GCP_PROJECT_ID` with degraded policy. ## One-off data migration from legacy Prisma DB If you still have the old **`Document`** / **`AuditLog`** tables (CPC-CSD Prisma schema) in PostgreSQL, run: ```bash npm run migrate:cpc-csd ``` Optional **`CPC_CSD_DATABASE_URL`**: if set, rows are read from that database and written to the database in **`DATABASE_URL`** (re-workflow). If unset, both read and write use **`DATABASE_URL`** (same cluster; both table sets must exist). After migration, spot-check history, document detail, and Excel downloads, then decommission the legacy app.