ZLD_POC/docs/superpowers/specs/2026-04-14-mineru-ai-word-parse-design.md

# MinerU AI Word Parse Design

## Goal

Replace the current AI text extraction pipeline with a MinerU-backed flow:

1. Accept an Illustrator `.ai` file and a Word `.docx` file.
2. Convert or normalize the `.ai` file into a PDF preview artifact.
3. Upload the PDF artifact to MinerU for document parsing.
4. Read MinerU JSON output blocks and their bounding boxes.
5. Compare MinerU text output against the Word document text.
6. Return field results that the existing React preview can highlight on the right side.

## Non-Goals

- Do not keep the old `parse_ai_document(...).fields` text extraction as the source of validation fields.
- Do not expose the MinerU API key to the frontend.
- Do not require a public callback URL; use polling because this is a local backend.
- Do not add a new manual annotation workflow.

## Backend Flow

The `/api/process` endpoint keeps its current two-file upload contract: `ai_file` and `word_file`.

For each job, the backend creates the existing runtime upload/output directories. The `.ai` file is converted into a PDF preview artifact using the existing `backend.app.ai_parser.parse_ai_document` conversion behavior. The resulting `preview.pdf` is copied into the job output directory and returned as the preview URL.

The backend then submits that PDF preview artifact to MinerU using the documented local-file upload flow:

1. `POST https://mineru.net/api/v4/file-urls/batch` with one file entry and `model_version: "vlm"`.
2. `PUT` the generated upload URL with the PDF bytes.
3. Poll `GET https://mineru.net/api/v4/extract-results/batch/{batch_id}` until the single file reaches `done`, `failed`, or a timeout.
4. When `done`, download `full_zip_url` into the job output directory.
5. Extract the zip into the job output directory and locate the structured JSON output.

The API token is read from `MINERU_API_KEY`. If it is missing, the backend returns a clear configuration error instead of attempting the request.

## MinerU JSON Mapping

The primary parser reads MinerU `middle.json`-style output because the sample JSON contains:

- `pdf_info[]`
- `page_idx`
- `page_size`
- `para_blocks[]`
- `discarded_blocks[]`
- block-level `bbox: [x0, y0, x1, y1]`
- nested `lines[].spans[]` with `content`, `html`, and span-level `bbox`

Each top-level `para_blocks` item becomes one validation result. For blocks with nested line/span content, the backend concatenates text-like span content. Table spans with `html` are converted to readable text by stripping tags and HTML entities. If a block has no readable text, it can still be returned as `empty_or_garbled` when useful, but empty decorative blocks should be skipped.

Coordinate mapping:

- MinerU uses pixel-like page coordinates with origin at the top-left.
- The frontend preview expects top-left coordinates named `x0_pt`, `top_pt`, `x1_pt`, and `bottom_pt`.
- The backend returns MinerU coordinates directly as field coordinates and sets preview `pageWidthPt/pageHeightPt` from `page_size`, because the frontend scales both preview and overlay from the same coordinate system.

For multi-page output, `page` is `page_idx + 1`.

## Word Comparison

The Word document remains the validation baseline. The backend uses the existing `extract_word_text` and `validate_field_against_word` behavior:

- MinerU block text is normalized and compared against the Word full text.
- The result status remains `matched`, `unmatched`, or `empty_or_garbled`.
- The response keeps a `fields` array compatible with the current React UI.

This preserves the existing sidebar and highlighter behavior while changing the field source from old AI PDF text extraction to MinerU OCR/layout extraction.

## Frontend Contract

The current `ProcessResponse` shape should remain mostly compatible:

- `preview.type`: `pdf`
- `preview.url`: generated PDF preview URL
- `preview.pageWidthPt`: MinerU page width
- `preview.pageHeightPt`: MinerU page height
- `fields[]`: validation blocks with text, status, reason, matched excerpt, page, and coordinates

Small frontend changes may be needed to make optional typography metadata safe because MinerU blocks do not provide Illustrator font names or font sizes.

The right preview continues to render `preview.pdf` and draw overlay rectangles from `fields[]`.

## Error Handling

Return actionable API errors for:

- Unsupported upload types.
- `.ai` to PDF conversion failure.
- Missing `MINERU_API_KEY`.
- MinerU upload URL request failure.
- MinerU upload PUT failure.
- MinerU polling timeout.
- MinerU task failure, including `err_msg` when present.
- Missing structured JSON in the downloaded zip.

The API key must not be logged or included in response payloads.

## Testing

Backend tests should cover:

- MinerU JSON block extraction from a sample local JSON file.
- HTML table text conversion.
- Coordinate mapping from MinerU bbox into field coordinates.
- Word comparison integration using mocked MinerU results.
- MinerU client control flow with mocked HTTP responses.

Manual verification should run the backend and frontend locally with `MINERU_API_KEY` set, upload the sample `.ai` and `.docx`, and confirm that result cards appear and corresponding boxes highlight on the right preview.