Initial commit: 包装审核 POC、Docker 与前后端

Made-with: Cursor
2026-04-15 17:18:49 +08:00
commit bbb4dd43b3
74 changed files with 297415 additions and 0 deletions
--- a/docs/superpowers/specs/2026-04-14-mineru-ai-word-parse-design.md
+++ b/docs/superpowers/specs/2026-04-14-mineru-ai-word-parse-design.md
@@ -0,0 +1,108 @@
+# MinerU AI Word Parse Design
+
+## Goal
+
+Replace the current AI text extraction pipeline with a MinerU-backed flow:
+
+1. Accept an Illustrator `.ai` file and a Word `.docx` file.
+2. Convert or normalize the `.ai` file into a PDF preview artifact.
+3. Upload the PDF artifact to MinerU for document parsing.
+4. Read MinerU JSON output blocks and their bounding boxes.
+5. Compare MinerU text output against the Word document text.
+6. Return field results that the existing React preview can highlight on the right side.
+
+## Non-Goals
+
+- Do not keep the old `parse_ai_document(...).fields` text extraction as the source of validation fields.
+- Do not expose the MinerU API key to the frontend.
+- Do not require a public callback URL; use polling because this is a local backend.
+- Do not add a new manual annotation workflow.
+
+## Backend Flow
+
+The `/api/process` endpoint keeps its current two-file upload contract: `ai_file` and `word_file`.
+
+For each job, the backend creates the existing runtime upload/output directories. The `.ai` file is converted into a PDF preview artifact using the existing `backend.app.ai_parser.parse_ai_document` conversion behavior. The resulting `preview.pdf` is copied into the job output directory and returned as the preview URL.
+
+The backend then submits that PDF preview artifact to MinerU using the documented local-file upload flow:
+
+1. `POST https://mineru.net/api/v4/file-urls/batch` with one file entry and `model_version: "vlm"`.
+2. `PUT` the generated upload URL with the PDF bytes.
+3. Poll `GET https://mineru.net/api/v4/extract-results/batch/{batch_id}` until the single file reaches `done`, `failed`, or a timeout.
+4. When `done`, download `full_zip_url` into the job output directory.
+5. Extract the zip into the job output directory and locate the structured JSON output.
+
+The API token is read from `MINERU_API_KEY`. If it is missing, the backend returns a clear configuration error instead of attempting the request.
+
+## MinerU JSON Mapping
+
+The primary parser reads MinerU `middle.json`-style output because the sample JSON contains:
+
+- `pdf_info[]`
+- `page_idx`
+- `page_size`
+- `para_blocks[]`
+- `discarded_blocks[]`
+- block-level `bbox: [x0, y0, x1, y1]`
+- nested `lines[].spans[]` with `content`, `html`, and span-level `bbox`
+
+Each top-level `para_blocks` item becomes one validation result. For blocks with nested line/span content, the backend concatenates text-like span content. Table spans with `html` are converted to readable text by stripping tags and HTML entities. If a block has no readable text, it can still be returned as `empty_or_garbled` when useful, but empty decorative blocks should be skipped.
+
+Coordinate mapping:
+
+- MinerU uses pixel-like page coordinates with origin at the top-left.
+- The frontend preview expects top-left coordinates named `x0_pt`, `top_pt`, `x1_pt`, and `bottom_pt`.
+- The backend returns MinerU coordinates directly as field coordinates and sets preview `pageWidthPt/pageHeightPt` from `page_size`, because the frontend scales both preview and overlay from the same coordinate system.
+
+For multi-page output, `page` is `page_idx + 1`.
+
+## Word Comparison
+
+The Word document remains the validation baseline. The backend uses the existing `extract_word_text` and `validate_field_against_word` behavior:
+
+- MinerU block text is normalized and compared against the Word full text.
+- The result status remains `matched`, `unmatched`, or `empty_or_garbled`.
+- The response keeps a `fields` array compatible with the current React UI.
+
+This preserves the existing sidebar and highlighter behavior while changing the field source from old AI PDF text extraction to MinerU OCR/layout extraction.
+
+## Frontend Contract
+
+The current `ProcessResponse` shape should remain mostly compatible:
+
+- `preview.type`: `pdf`
+- `preview.url`: generated PDF preview URL
+- `preview.pageWidthPt`: MinerU page width
+- `preview.pageHeightPt`: MinerU page height
+- `fields[]`: validation blocks with text, status, reason, matched excerpt, page, and coordinates
+
+Small frontend changes may be needed to make optional typography metadata safe because MinerU blocks do not provide Illustrator font names or font sizes.
+
+The right preview continues to render `preview.pdf` and draw overlay rectangles from `fields[]`.
+
+## Error Handling
+
+Return actionable API errors for:
+
+- Unsupported upload types.
+- `.ai` to PDF conversion failure.
+- Missing `MINERU_API_KEY`.
+- MinerU upload URL request failure.
+- MinerU upload PUT failure.
+- MinerU polling timeout.
+- MinerU task failure, including `err_msg` when present.
+- Missing structured JSON in the downloaded zip.
+
+The API key must not be logged or included in response payloads.
+
+## Testing
+
+Backend tests should cover:
+
+- MinerU JSON block extraction from a sample local JSON file.
+- HTML table text conversion.
+- Coordinate mapping from MinerU bbox into field coordinates.
+- Word comparison integration using mocked MinerU results.
+- MinerU client control flow with mocked HTTP responses.
+
+Manual verification should run the backend and frontend locally with `MINERU_API_KEY` set, upload the sample `.ai` and `.docx`, and confirm that result cards appear and corresponding boxes highlight on the right preview.