Re_Backend/docs/CPC-CDC.md
2026-04-17 19:58:45 +05:30

69 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CPC-CSD module (re-workflow)
This module (formerly referred to as CPC-CDC in code comments) covers **CPC/CSD document upload, OCR/extraction, validation against MSD payloads, audit history, dashboards, and Excel reports**. It was consolidated from the standalone **CPC-CSD** app into this backend.
## HTTP API
**CPC-CSD-compatible URLs** (same as `CPC-CSD/server/src/routes/index.js` + Postman `CPC-CSD-Full-Flow`): `POST /api/upload`, `GET /api/documents/*`, `POST /api/v1/ocr/validate`, `POST /api/v1/ocr/validate-upload` (field **`file`**), `POST /api/v1/ocr/upload` (field **`files`**, max 20), report downloads under `/api/v1/ocr/report/...`. Registered from `src/routes/cpc-csd-compat.mount.ts` before `/api/v1`; disable with **`CPC_LEGACY_COMPAT_ROUTES=false`**.
**Namespaced API** — canonical prefix **`/api/v1/cpc-csd`**; legacy alias **`/api/v1/cpc-cdc`** (`src/routes/cpc-cdc.routes.ts`) mounts the same handlers and auth.
| Method | Path (prefix **`/api`** or **`/api/v1/cpc-csd`** or legacy **`/api/v1/cpc-cdc`**) | Purpose |
|--------|------|---------|
| POST | `/upload` | GCS-only: multipart field **`file`** → `{ gcsUrl }` (compat: **`/api/upload`**) |
| POST | `/v1/ocr/validate` | JSON URL mode — returns **400** with legacy message (use validate-upload) |
| POST | `/v1/ocr/validate-upload` | Single file field **`file`** + `claim_id` / `msd_payload` / … |
| POST | `/v1/ocr/upload` | Bulk: field **`files`** (max 20) + `metadata_queue` or `msd_payload` / `document_type` |
| GET | `/documents/analytics` | Totals, pass rate, distribution, `dailyVolume`, `topMismatchFields` |
| GET | `/documents/history` | `claimId` query — attempts grouped |
| GET | `/documents/recent` | Paginated list; query: `page`, `limit`, `search`, `status`, `type`, `sortBy`, `order` |
| GET | `/documents/:id/file` | Authenticated file bytes for preview (browser cannot use `gs://` directly) |
| GET | `/documents/:id` | Document + audit logs + `field_results` |
| PUT | `/documents/:id/status` | Manual status / corrected fields |
| DELETE | `/documents/:id` | Remove document row |
| GET | `/v1/ocr/report/:claimId/download` | Per-claim Excel |
| GET | `/v1/ocr/report/all/download` | Master Excel (supports `search`, `status`, `type`) |
Compat paths are under **`/api/...`**; namespaced routes are **`/api/v1/cpc-csd/...`** with **`/api/v1/cpc-cdc/...`** as an alias (same path suffixes as in the tables second column).
## Database
Sequelize models: **`CpcDocument`** (`cpc_documents`), **`CpcAuditLog`** (`cpc_audit_logs`). Migration: `src/migrations/2026041300-create-cpc-cdc-tables.ts`.
**Admin viewer list** is stored under `admin_configurations.config_key = CPC_CSD_ADMIN_CONFIG` (migration `20260416120000-rename-cpc-cdc-admin-config-key.ts` renames the legacy `CPC_CDC_ADMIN_CONFIG` row when applied).
On **application startup**, `ensureCpcCdcSchema()` runs after DB connect (`src/services/cpc-cdc/ensureCpcCdcSchema.ts`) so `CREATE TABLE IF NOT EXISTS` applies if migrations were skipped; still run `npm run migrate` for a full schema history.
Notable columns on `cpc_documents`: `booking_id`, `claim_id`, `attempt_no`, `document_type`, `document_gcp_url`, `provider`, JSONB `msd_payload`, `extracted_fields`, `field_confidence`, `validation_status`, `match_percentage`, `mismatch_reasons`, `field_results`, `ip_address`.
Unique index: `(claim_id, attempt_no, document_type)` — important when migrating legacy data with duplicates.
## Environment variables
Copy **`re-workflow-be/.env.example`** to `.env` and adjust. Typical keys (see `CpcCdcController` and `src/services/cpc-cdc/*`):
- **`GCP_PROJECT_ID`** — GCP project for Vertex / optional Document AI.
- **`VERTEX_AI_LOCATION`** — Vertex region (e.g. `asia-south1`).
- **`DOC_AI_PROCESSOR_ID`** — Optional; when set and valid, Document AI OCR may run before Gemini.
- **`GCP_LOCATION_DOC_AI`** — Document AI region (default `us`).
- **GCS** — Bucket/credentials as required by `CpcGcsService` (service account via `GOOGLE_APPLICATION_CREDENTIALS` or workload identity).
- **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI`** — **`true`**: always allow saving after failed/missing Vertex. **`false`**: in **production** only, disallow degraded saves. **Omitted in non-production**: degraded saves are **allowed** so local CPC works without GCP; set to **`false`** in dev to force strict Vertex. **Omitted in production**: strict (Vertex required unless `RULES` provider).
**Extraction behaviour (upload response):**
- **`extraction_source`: `vertex_gemini`** — Fields came from the Vertex Gemini API (document bytes + optional Document AI OCR text).
- **`extraction_source`: `rules_engine`** — Provider was **`RULES`**; fields come from `CpcRuleExtractService` on OCR text only (no Gemini).
- **`extraction_source`: `degraded_empty`** — Extraction was skipped, failed, or (in **non-production**) hit a **Vertex auth / ADC** problem; the row is still stored with empty `extracted_fields` so you can test DB/history. In production this only happens when **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI=true`** or missing `GCP_PROJECT_ID` with degraded policy.
## One-off data migration from legacy Prisma DB
If you still have the old **`Document`** / **`AuditLog`** tables (CPC-CSD Prisma schema) in PostgreSQL, run:
```bash
npm run migrate:cpc-csd
```
Optional **`CPC_CSD_DATABASE_URL`**: if set, rows are read from that database and written to the database in **`DATABASE_URL`** (re-workflow). If unset, both read and write use **`DATABASE_URL`** (same cluster; both table sets must exist).
After migration, spot-check history, document detail, and Excel downloads, then decommission the legacy app.