69 lines
5.6 KiB
Markdown
69 lines
5.6 KiB
Markdown
# CPC-CSD module (re-workflow)
|
||
|
||
This module (formerly referred to as CPC-CDC in code comments) covers **CPC/CSD document upload, OCR/extraction, validation against MSD payloads, audit history, dashboards, and Excel reports**. It was consolidated from the standalone **CPC-CSD** app into this backend.
|
||
|
||
## HTTP API
|
||
|
||
**CPC-CSD-compatible URLs** (same as `CPC-CSD/server/src/routes/index.js` + Postman `CPC-CSD-Full-Flow`): `POST /api/upload`, `GET /api/documents/*`, `POST /api/v1/ocr/validate`, `POST /api/v1/ocr/validate-upload` (field **`file`**), `POST /api/v1/ocr/upload` (field **`files`**, max 20), report downloads under `/api/v1/ocr/report/...`. Registered from `src/routes/cpc-csd-compat.mount.ts` before `/api/v1`; disable with **`CPC_LEGACY_COMPAT_ROUTES=false`**.
|
||
|
||
**Namespaced API** — canonical prefix **`/api/v1/cpc-csd`**; legacy alias **`/api/v1/cpc-cdc`** (`src/routes/cpc-cdc.routes.ts`) mounts the same handlers and auth.
|
||
|
||
| Method | Path (prefix **`/api`** or **`/api/v1/cpc-csd`** or legacy **`/api/v1/cpc-cdc`**) | Purpose |
|
||
|--------|------|---------|
|
||
| POST | `/upload` | GCS-only: multipart field **`file`** → `{ gcsUrl }` (compat: **`/api/upload`**) |
|
||
| POST | `/v1/ocr/validate` | JSON URL mode — returns **400** with legacy message (use validate-upload) |
|
||
| POST | `/v1/ocr/validate-upload` | Single file field **`file`** + `claim_id` / `msd_payload` / … |
|
||
| POST | `/v1/ocr/upload` | Bulk: field **`files`** (max 20) + `metadata_queue` or `msd_payload` / `document_type` |
|
||
| GET | `/documents/analytics` | Totals, pass rate, distribution, `dailyVolume`, `topMismatchFields` |
|
||
| GET | `/documents/history` | `claimId` query — attempts grouped |
|
||
| GET | `/documents/recent` | Paginated list; query: `page`, `limit`, `search`, `status`, `type`, `sortBy`, `order` |
|
||
| GET | `/documents/:id/file` | Authenticated file bytes for preview (browser cannot use `gs://` directly) |
|
||
| GET | `/documents/:id` | Document + audit logs + `field_results` |
|
||
| PUT | `/documents/:id/status` | Manual status / corrected fields |
|
||
| DELETE | `/documents/:id` | Remove document row |
|
||
| GET | `/v1/ocr/report/:claimId/download` | Per-claim Excel |
|
||
| GET | `/v1/ocr/report/all/download` | Master Excel (supports `search`, `status`, `type`) |
|
||
|
||
Compat paths are under **`/api/...`**; namespaced routes are **`/api/v1/cpc-csd/...`** with **`/api/v1/cpc-cdc/...`** as an alias (same path suffixes as in the table’s second column).
|
||
|
||
## Database
|
||
|
||
Sequelize models: **`CpcDocument`** (`cpc_documents`), **`CpcAuditLog`** (`cpc_audit_logs`). Migration: `src/migrations/2026041300-create-cpc-cdc-tables.ts`.
|
||
|
||
**Admin viewer list** is stored under `admin_configurations.config_key = CPC_CSD_ADMIN_CONFIG` (migration `20260416120000-rename-cpc-cdc-admin-config-key.ts` renames the legacy `CPC_CDC_ADMIN_CONFIG` row when applied).
|
||
|
||
On **application startup**, `ensureCpcCdcSchema()` runs after DB connect (`src/services/cpc-cdc/ensureCpcCdcSchema.ts`) so `CREATE TABLE IF NOT EXISTS` applies if migrations were skipped; still run `npm run migrate` for a full schema history.
|
||
|
||
Notable columns on `cpc_documents`: `booking_id`, `claim_id`, `attempt_no`, `document_type`, `document_gcp_url`, `provider`, JSONB `msd_payload`, `extracted_fields`, `field_confidence`, `validation_status`, `match_percentage`, `mismatch_reasons`, `field_results`, `ip_address`.
|
||
|
||
Unique index: `(claim_id, attempt_no, document_type)` — important when migrating legacy data with duplicates.
|
||
|
||
## Environment variables
|
||
|
||
Copy **`re-workflow-be/.env.example`** to `.env` and adjust. Typical keys (see `CpcCdcController` and `src/services/cpc-cdc/*`):
|
||
|
||
- **`GCP_PROJECT_ID`** — GCP project for Vertex / optional Document AI.
|
||
- **`VERTEX_AI_LOCATION`** — Vertex region (e.g. `asia-south1`).
|
||
- **`DOC_AI_PROCESSOR_ID`** — Optional; when set and valid, Document AI OCR may run before Gemini.
|
||
- **`GCP_LOCATION_DOC_AI`** — Document AI region (default `us`).
|
||
- **GCS** — Bucket/credentials as required by `CpcGcsService` (service account via `GOOGLE_APPLICATION_CREDENTIALS` or workload identity).
|
||
- **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI`** — **`true`**: always allow saving after failed/missing Vertex. **`false`**: in **production** only, disallow degraded saves. **Omitted in non-production**: degraded saves are **allowed** so local CPC works without GCP; set to **`false`** in dev to force strict Vertex. **Omitted in production**: strict (Vertex required unless `RULES` provider).
|
||
|
||
**Extraction behaviour (upload response):**
|
||
|
||
- **`extraction_source`: `vertex_gemini`** — Fields came from the Vertex Gemini API (document bytes + optional Document AI OCR text).
|
||
- **`extraction_source`: `rules_engine`** — Provider was **`RULES`**; fields come from `CpcRuleExtractService` on OCR text only (no Gemini).
|
||
- **`extraction_source`: `degraded_empty`** — Extraction was skipped, failed, or (in **non-production**) hit a **Vertex auth / ADC** problem; the row is still stored with empty `extracted_fields` so you can test DB/history. In production this only happens when **`CPC_ALLOW_DEGRADED_SAVE_WITHOUT_AI=true`** or missing `GCP_PROJECT_ID` with degraded policy.
|
||
|
||
## One-off data migration from legacy Prisma DB
|
||
|
||
If you still have the old **`Document`** / **`AuditLog`** tables (CPC-CSD Prisma schema) in PostgreSQL, run:
|
||
|
||
```bash
|
||
npm run migrate:cpc-csd
|
||
```
|
||
|
||
Optional **`CPC_CSD_DATABASE_URL`**: if set, rows are read from that database and written to the database in **`DATABASE_URL`** (re-workflow). If unset, both read and write use **`DATABASE_URL`** (same cluster; both table sets must exist).
|
||
|
||
After migration, spot-check history, document detail, and Excel downloads, then decommission the legacy app.
|