更新readme

This commit is contained in:
xunbu
2025-08-18 20:19:03 +08:00
parent befbca41c8
commit c645fc142a
2 changed files with 348 additions and 334 deletions

337
README.md
View File

@@ -12,28 +12,19 @@
[**简体中文**](/README_ZH.md) / [**English**](/README.md) / [**日本語**](/README_JP.md)
**DocuTranslate** is a file translation tool that combines advanced document parsing engines (such
as [docling](https://github.com/docling-project/docling) and [minerU](https://mineru.net/)) with large language models (
LLMs) to accurately translate documents in various formats.
**DocuTranslate** is a file translation tool that combines advanced document analysis engines (such as [docling](https://github.com/docling-project/docling) and [minerU](https://mineru.net/)) with large language models (LLMs). It can accurately translate documents in a wide variety of formats.
The new version adopts a **Workflow-centric** architecture, providing highly configurable and scalable solutions for
various types of translation tasks.
The new version's architecture adopts **Workflow** as its core, providing a highly configurable and extensible solution for various types of translation tasks.
-**Support for Diverse Formats**: Capable of translating various file formats such as `pdf`, `docx`, `xlsx`, `md`,
`txt`, `json`, `epub`, `srt`, etc.
-**Table, Formula, and Code Recognition**: Utilizes `docling` and `minerU` to recognize and translate tables,
formulas, and code frequently found in academic papers.
-**JSON Translation**: Allows specifying translatable values within JSON using jsonpath-ng syntax.
-**High-Fidelity Word/Excel Translation**: Preserves the formatting of `docx` and `xlsx` files (note: `doc` and `xls`
are not supported).
-**Multiple AI Platform Support**: Covers major AI platforms and enables high-parallel AI translation with custom
prompts.
-**Asynchronous Support**: Designed for high-performance scenarios, offering full asynchronous support and multi-task
parallel processing APIs.
-**Interactive Web Interface**: Equipped with a ready-to-use Web UI and RESTful API.
-**Supports a wide variety of formats**: Capable of translating files such as `pdf`, `docx`, `xlsx`, `md`, `txt`, `json`, `epub`, `srt`, etc.
-**Recognition of tables, formulas, and code**: Utilizes `docling` and `mineru` to recognize and translate tables, formulas, and code commonly found in academic papers.
-**JSON translation**: Supports specifying values within JSON that need translation through JSON paths (following the `jsonpath-ng` syntax specification).
-**High-fidelity translation for Word/Excel**: Supports translation of `docx` and `xlsx` files (currently does not support `doc` and `xls` files) while preserving the original formatting.
-**Support for multiple AI platforms**: Compatible with most AI platforms, enabling high-performance parallel AI translation with custom prompts.
-**Asynchronous support**: Designed for high-performance scenarios, offering full asynchronous support and implementing a service interface capable of multitask parallel processing.
-**Interactive web interface**: Provides an out-of-the-box Web UI and RESTful API for easy integration and use.
> When translating `pdf` files, they are converted to markdown, **resulting in loss of the original layout**. Please be
> cautious if layout preservation is a priority.
> When translating `pdf` files, they are first converted to markdown, so the original typesetting will be **lost**. Users with typesetting requirements should note this.
> QQ Discussion Group: 1047781902
@@ -46,15 +37,12 @@ various types of translation tasks.
**Novel Translation**:
![翻译效果](/images/小说翻译.png)
## Bundled Version
## Integrated Packages
For users who want to get started quickly, we provide a bundled version
on [GitHub Releases](https://github.com/xunbu/docutranslate/releases). Simply download, extract, and input the API key
of your preferred AI platform to start using it.
For users who want to get started quickly, we provide integrated packages on [GitHub Releases](https://github.com/xunbu/docutranslate/releases). Simply download, unzip, and enter your AI platform's API key to start using.
- **DocuTranslate**: Standard version, uses the online `minerU` engine.
- **DocuTranslate_full**: Full version, includes the local `docling` parsing engine, ideal for offline environments or
scenarios prioritizing data privacy.
- **DocuTranslate**: The standard version, which uses the online `minerU` engine to parse documents. Recommended for most users.
- **DocuTranslate_full**: The full version, which includes the `docling` local parsing engine. Suitable for offline scenarios or those with higher data privacy requirements.
## Installation
@@ -64,97 +52,92 @@ of your preferred AI platform to start using it.
# Basic installation
pip install docutranslate
# When using the docling local analysis engine
# If using the docling local parsing engine
pip install docutranslate[docling]
```
### Using uv
```bash
# Environment initialization
# Initialize the environment
uv init
# Basic installation
uv add docutranslate
# Extended installation with docling
# Install docling extension
uv add docutranslate[docling]
```
### Using git
```bash
# Environment initialization
# Initialize the environment
git clone https://github.com/xunbu/docutranslate.git
cd docutranslate
uv sync
```
## Core Concept: Workflow
The heart of the new version of DocuTranslate is the **Workflow**. Each workflow is a complete end-to-end translation
pipeline designed for a specific file type. Instead of interacting with large classes, you select and configure the
appropriate workflow based on the file type.
The core of the new version of DocuTranslate is the **Workflow**. Each workflow is a complete end-to-end translation pipeline designed for a specific file type. Instead of interacting with large classes as before, you will select and configure the appropriate workflow according to the file type.
**The basic usage steps are as follows:**
1. **Select a Workflow**: Choose a workflow such as `MarkdownBasedWorkflow` or `TXTWorkflow` based on the input file
type (e.g., PDF/Word or TXT).
2. **Build the Configuration**: Create a configuration object (e.g., `MarkdownBasedWorkflowConfig`) corresponding to the
selected workflow. This configuration object includes all necessary sub-configurations, such as:
* **Converter Config**: Defines how to convert the original file (e.g., PDF) into Markdown.
* **Translator Config**: Defines the LLM to use, API keys, target language, etc.
1. **Select a Workflow**: Choose a workflow based on the input file type (e.g., PDF/Word or TXT). For example, `MarkdownBasedWorkflow` or `TXTWorkflow`.
2. **Build Configuration**: Create a configuration object corresponding to the selected workflow (such as `MarkdownBasedWorkflowConfig`). This configuration object contains all the necessary sub-configurations, such as:
* **Converter Config**: Defines how to convert the original file (e.g., PDF) to Markdown.
* **Translator Config**: Defines the LLM to use, API-Key, target language, etc.
* **Exporter Config**: Defines specific options for the output format (e.g., HTML).
3. **Instantiate the Workflow**: Use the configuration object to create an instance of the workflow.
4. **Execute the Translation**: Call the workflow's `.read_*()` and `.translate()` / `.translate_async()` methods.
5. **Export/Save the Results**: Call the `.export_to_*()` or `.save_as_*()` methods to retrieve or save the translated
results.
3. **Instantiate the Workflow**: Create an instance of the workflow using the configuration object.
4. **Execute Translation**: Call the workflow's `.read_*()` method and `.translate()` / `.translate_async()` method.
5. **Export/Save Results**: Call the `.export_to_*()` method or `.save_as_*()` method to retrieve or save the translation results.
## Available Workflows
| Workflow | Applicable Scenarios | Input Formats | Output Formats | Core Configuration Class |
|:----------------------------|:----------------------------------------------------------------------------------------------------------------------|:---------------------------------------------|:-----------------------|:------------------------------|
| **`MarkdownBasedWorkflow`** | Processes rich-text documents like PDF, Word, and images. Follows the flow: "File Markdown Translation Export". | `.pdf`, `.docx`, `.md`, `.png`, `.jpg`, etc. | `.md`, `.zip`, `.html` | `MarkdownBasedWorkflowConfig` |
| **`TXTWorkflow`** | Processes plain text documents. Follows the flow: "txt Translation Export". | `.txt` and other plain text formats | `.txt`, `.html` | `TXTWorkflowConfig` |
| **`JsonWorkflow`** | Processes JSON files. Follows the flow: "json Translation Export". | `.json` | `.json`, `.html` | `JsonWorkflowConfig` |
| **`DocxWorkflow`** | Processes DOCX files. Follows the flow: "docx Translation Export". | `.docx` | `.docx`, `.html` | `docxWorkflowConfig` |
| **`XlsxWorkflow`** | Processes XLSX files. Follows the flow: "xlsx Translation Export". | `.xlsx` | `.xlsx`, `.html` | `XlsxWorkflowConfig` |
| **`SrtWorkflow`** | Processes SRT files. Follows the flow: "srt Translation Export". | `.srt` | `.srt`, `.html` | `SrtWorkflowConfig` |
| **`EpubWorkflow`** | Processes EPUB files. Follows the flow: "epub Translation Export". | `.epub` | `.epub`, `.html` | `EpubWorkflowConfig` |
| **`HtmlWorkflow`** | Processes HTML files. Follows the flow: "html Translation Export". | `.html`, `.htm` | `.html` | `HtmlWorkflowConfig` |
| Workflow | Application Scenario | Input Format | Output Format | Core Configuration Class |
|:---------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------|:-----------------------|:----------------------------------|
| **`MarkdownBasedWorkflow`** | Process rich text documents such as PDF, Word, and images. Flow: `File -> Markdown -> Translation -> Export`. | `.pdf`, `.docx`, `.md`, `.png`, `.jpg`, etc. | `.md`, `.zip`, `.html` | `MarkdownBasedWorkflowConfig` |
| **`TXTWorkflow`** | Process plain text documents. Flow: `txt -> Translation -> Export`. | `.txt` and other plain text formats | `.txt`, `.html` | `TXTWorkflowConfig` |
| **`JsonWorkflow`** | Process json files. Flow: `json -> Translation -> Export`. | `.json` | `.json`, `.html` | `JsonWorkflowConfig` |
| **`DocxWorkflow`** | Process docx files. Flow: `docx -> Translation -> Export`. | `.docx` | `.docx`, `.html` | `docxWorkflowConfig` |
| **`XlsxWorkflow`** | Process xlsx files. Flow: `xlsx -> Translation -> Export`. | `.xlsx` | `.xlsx`, `.html` | `XlsxWorkflowConfig` |
| **`SrtWorkflow`** | Process srt files. Flow: `srt -> Translation -> Export`. | `.srt` | `.srt`, `.html` | `SrtWorkflowConfig` |
| **`EpubWorkflow`** | Process epub files. Flow: `epub -> Translation -> Export`. | `.epub` | `.epub`, `.html` | `EpubWorkflowConfig` |
| **`HtmlWorkflow`** | Process html files. Flow: `html -> Translation -> Export`. | `.html`, `.htm` | `.html` | `HtmlWorkflowConfig` |
> The interactive interface supports exporting in PDF format.
> The interactive interface allows export in pdf format.
## Launching Web UI and API Services
## Starting the Web UI and API Service
For convenience, DocuTranslate provides a feature-rich web interface and RESTful API.
For ease of use, DocuTranslate provides a feature-rich web interface and RESTful API.
**Starting the Service:**
```bash
# Start the service (default port: 8010)
# Start the service, which monitors port 8010 by default
docutranslate -i
# Start with a specified port
docutranslate -i -p 8011
# Alternatively, specify the port via environment variable
# You can also specify the port using an environment variable
export DOCUTRANSLATE_PORT=8011
docutranslate -i
```
- **Interactive Interface**: After starting the service, access `http://127.0.0.1:8010` (or the specified port) in your
browser.
- **API Documentation**: Complete API documentation (Swagger UI) is available at `http://127.0.0.1:8010/docs`.
- **Interactive Interface**: After starting the service, access `http://127.0.0.1:8010` (or the specified port) in your browser.
- **API Documentation**: The complete API documentation (Swagger UI) is available at `http://127.0.0.1:8010/docs`.
## Usage
### Example 1: Translating PDF Files (Using `MarkdownBasedWorkflow`)
### Example 1: Translating a PDF File (Using `MarkdownBasedWorkflow`)
This is the most common use case. The `minerU` engine is used to convert PDFs to Markdown, followed by translation via
LLM. Here, an asynchronous approach is demonstrated.
This is the most common use case. Convert the PDF to Markdown using the `minerU` engine and translate it with an LLM. Here, we use the asynchronous method as an example.
```python
import asyncio
@@ -168,45 +151,45 @@ async def main():
# 1. Build translator configuration
translator_config = MDTranslatorConfig(
base_url="https://open.bigmodel.cn/api/paas/v4", # Base URL of the AI platform
api_key="YOUR_ZHIPU_API_KEY", # API Key for the AI platform
api_key="YOUR_ZHIPU_API_KEY", # API Key of the AI platform
model_id="glm-4-air", # Model ID
to_lang="English", # Target language
chunk_size=3000, # Text chunk size
concurrent=10 # Number of concurrent processes
concurrent=10 # Number of concurrent executions
)
# 2. Build converter configuration (using minerU)
converter_config = ConverterMineruConfig(
mineru_token="YOUR_MINERU_TOKEN", # minerU token
mineru_token="YOUR_MINERU_TOKEN", # Your minerU Token
formula_ocr=True # Enable formula recognition
)
# 3. Build main workflow configuration
workflow_config = MarkdownBasedWorkflowConfig(
convert_engine="mineru", # Specify the parsing engine
converter_config=converter_config, # Apply converter configuration
translator_config=translator_config, # Apply translator configuration
converter_config=converter_config, # Pass the converter configuration
translator_config=translator_config, # Pass the translator configuration
html_exporter_config=MD2HTMLExporterConfig(cdn=True) # HTML export configuration
)
# 4. Instantiate the workflow
workflow = MarkdownBasedWorkflow(config=workflow_config)
# 5. Load file and execute translation
# 5. Load the file and execute translation
print("Starting file loading and translation...")
workflow.read_path("path/to/your/document.pdf")
await workflow.translate_async()
# Or use synchronous method
# Or use the synchronous method
# workflow.translate()
print("Translation completed!")
# 6. Save results
# 6. Save the results
workflow.save_as_html(name="translated_document.html")
workflow.save_as_markdown_zip(name="translated_document.zip")
workflow.save_as_markdown(name="translated_document.md") # Image-embedded Markdown
print("Files have been saved in the ./output folder.")
workflow.save_as_markdown(name="translated_document.md") # Markdown with embedded images
print("Files saved to the ./output folder.")
# Or directly retrieve content strings
# Or directly get the content string
html_content = workflow.export_to_html()
html_content = workflow.export_to_markdown()
# print(html_content)
@@ -216,6 +199,10 @@ if __name__ == "__main__":
asyncio.run(main())
```
### Example 2: Translating TXT Files (Using `TXTWorkflow`)
For pure text files, the process is simpler as there is no need for document parsing (conversion). Here is an example using the asynchronous method.
```python
import asyncio
from docutranslate.workflow.txt_workflow import TXTWorkflow, TXTWorkflowConfig
@@ -224,15 +211,15 @@ from docutranslate.exporter.txt.txt2html_exporter import TXT2HTMLExporterConfig
async def main():
# 1. Build translator configuration
# 1. Build the translator configuration
translator_config = TXTTranslatorConfig(
base_url="https://api.openai.com/v1/",
api_key="YOUR_OPENAI_API_KEY",
model_id="gpt-4o",
to_lang="Japanese",
to_lang="中文",
)
# 2. Build main workflow configuration
# 2. Build the main workflow configuration
workflow_config = TXTWorkflowConfig(
translator_config=translator_config,
html_exporter_config=TXT2HTMLExporterConfig(cdn=True)
@@ -241,17 +228,17 @@ async def main():
# 3. Instantiate the workflow
workflow = TXTWorkflow(config=workflow_config)
# 4. Load the file and execute translation
# 4. Read the file and execute translation
workflow.read_path("path/to/your/notes.txt")
await workflow.translate_async()
# Alternatively, use the synchronous method
# Or use the synchronous method
# workflow.translate()
# 5. Save the results
# 5. Save the result
workflow.save_as_txt(name="translated_notes.txt")
print("TXT file has been saved.")
print("TXT file saved.")
# It's also possible to export the translated plain text
# You can also export the translated plain text
text = workflow.export_to_txt()
@@ -259,6 +246,13 @@ if __name__ == "__main__":
asyncio.run(main())
```
### Example 3: Translating a JSON file (using `JsonWorkflow`)
Here, we show an example using the asynchronous method. In the `json_paths` item of `JsonTranslatorConfig`, you need to specify the JSON paths to be translated (following the jsonpath-ng syntax rules).
Only the values matching the JSON paths will be translated.
```python
import asyncio
@@ -268,16 +262,16 @@ from docutranslate.workflow.json_workflow import JsonWorkflowConfig, JsonWorkflo
async def main():
# 1. Configure the translator
# 1. Build the translator configuration
translator_config = JsonTranslatorConfig(
base_url="https://api.openai.com/v1/",
api_key="YOUR_OPENAI_API_KEY",
model_id="gpt-4o",
to_lang="Japanese",
json_paths=["$.*", "$.name"] # Complies with jsonpath-ng syntax; values matching these paths will be translated
to_lang="Chinese",
json_paths=["$.*", "$.name"] # Compliant with the jsonpath-ng path syntax; all values matching the path will be translated
)
# 2. Configure the main workflow
# 2. Build the main workflow configuration
workflow_config = JsonWorkflowConfig(
translator_config=translator_config,
html_exporter_config=Json2HTMLExporterConfig(cdn=True)
@@ -286,17 +280,17 @@ async def main():
# 3. Instantiate the workflow
workflow = JsonWorkflow(config=workflow_config)
# 4. Load the file and execute the translation
# 4. Read the file and execute translation
workflow.read_path("path/to/your/notes.json")
await workflow.translate_async()
# Alternatively, use the synchronous method
# Or use the synchronous method
# workflow.translate()
# 5. Save the results
workflow.save_as_json(name="translated_notes.json")
print("JSON file has been saved.")
print("The JSON file has been saved.")
# The translated JSON text can also be exported
# You can also export the translated json text
text = workflow.export_to_json()
@@ -304,6 +298,12 @@ if __name__ == "__main__":
asyncio.run(main())
```
### Example 4: Translating a docx File (Using `DocxWorkflow`)
Here, the asynchronous method is shown as an example.
```python
import asyncio
@@ -313,17 +313,17 @@ from docutranslate.workflow.docx_workflow import DocxWorkflowConfig, DocxWorkflo
async def main():
# 1. Configure the translator
# 1. Build the translator configuration
translator_config = DocxTranslatorConfig(
base_url="https://api.openai.com/v1/",
api_key="YOUR_OPENAI_API_KEY",
model_id="gpt-4o",
to_lang="Japanese",
insert_mode="replace", # Options: "replace", "append", "prepend"
separator="\n", # Separator used in "append" or "prepend" mode
to_lang="日本語",
insert_mode="replace", # Optional: "replace", "append", "prepend"
separator="\n", # Separator used in "append" and "prepend" modes
)
# 2. Configure the main workflow
# 2. Build the main workflow configuration
workflow_config = DocxWorkflowConfig(
translator_config=translator_config,
html_exporter_config=Docx2HTMLExporterConfig(cdn=True)
@@ -335,14 +335,14 @@ async def main():
# 4. Load the file and execute translation
workflow.read_path("path/to/your/notes.docx")
await workflow.translate_async()
# Alternatively, use the synchronous method
# Or use the synchronous method
# workflow.translate()
# 5. Save the results
# 5. Save the result
workflow.save_as_docx(name="translated_notes.docx")
print("The docx file has been saved.")
# The translated docx can also be exported as binary
# You can also export the translated docx as binary
text_bytes = workflow.export_to_docx()
@@ -350,6 +350,12 @@ if __name__ == "__main__":
asyncio.run(main())
```
### Example 5: Translating an xlsx file (using `XlsxWorkflow`)
Here, we will use the asynchronous method as an example.
```python
import asyncio
@@ -359,17 +365,17 @@ from docutranslate.workflow.xlsx_workflow import XlsxWorkflowConfig, XlsxWorkflo
async def main():
# 1. Build translator configuration
# 1. Build the translator configuration
translator_config = XlsxTranslatorConfig(
base_url="https://api.openai.com/v1/",
api_key="YOUR_OPENAI_API_KEY",
model_id="gpt-4o",
to_lang="Japanese",
insert_mode="replace", # Options: "replace", "append", "prepend"
separator="\n", # Separator used in "append" or "prepend" mode
to_lang="日本語",
insert_mode="replace", # Optional: "replace", "append", "prepend"
separator="\n", # Separator used in "append" and "prepend" modes
)
# 2. Build main workflow configuration
# 2. Build the main workflow configuration
workflow_config = XlsxWorkflowConfig(
translator_config=translator_config,
html_exporter_config=Xlsx2HTMLExporterConfig(cdn=True)
@@ -381,14 +387,14 @@ async def main():
# 4. Load the file and execute translation
workflow.read_path("path/to/your/notes.xlsx")
await workflow.translate_async()
# Alternatively, use the synchronous method
# Or use the synchronous method
# workflow.translate()
# 5. Save the results
# 5. Save the result
workflow.save_as_xlsx(name="translated_notes.xlsx")
print("The xlsx file has been saved.")
print("The XLSX file has been saved.")
# It's also possible to export the translated xlsx as binary
# You can also export the binary data of the translated XLSX
text_bytes = workflow.export_to_xlsx()
@@ -396,56 +402,59 @@ if __name__ == "__main__":
asyncio.run(main())
```
### 1. Obtaining API Keys for Large-Scale Language Models
The translation functionality relies on large-scale language models, requiring the retrieval of `base_url`, `api_key`,
and `model_id` from the corresponding AI platform.
> Recommended models: Volcano Engine's `doubao-seed-1-6-250615`, `doubao-seed-1-6-flash-250715`, Zhipu's `glm-4-flash`,
> Alibaba Cloud's `qwen-plus`,
> `qwen-turbo`, DeepSeek's `deepseek-chat`, etc.
## Detailed Explanation of Prerequisites and Settings
| Platform Name | API Key Retrieval Method | Base URL |
|-----------------------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| ollama | | http://127.0.0.1:11434/v1 |
| lm studio | | http://127.0.0.1:1234/v1 |
| openrouter | [Click to retrieve](https://openrouter.ai/settings/keys) | https://openrouter.ai/api/v1 |
| openai | [Click to retrieve](https://platform.openai.com/api-keys) | https://api.openai.com/v1/ |
| gemini | [Click to retrieve](https://aistudio.google.com/u/0/apikey) | https://generativelanguage.googleapis.com/v1beta/openai/ |
| deepseek | [Click to retrieve](https://platform.deepseek.com/api_keys) | https://api.deepseek.com/v1 |
| Zhipu AI | [Click to retrieve](https://open.bigmodel.cn/usercenter/apikeys) | https://open.bigmodel.cn/api/paas/v4 |
| Tencent Hunyuan | [Click to retrieve](https://console.cloud.tencent.com/hunyuan/api-key) | https://api.hunyuan.cloud.tencent.com/v1 |
| Alibaba Cloud Bailian | [Click to retrieve](https://bailian.console.aliyun.com/?tab=model#/api-key) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| Volcano Engine | [Click to retrieve](https://console.volcengine.com/ark/region:ark+cn-beijing/apiKey?apikey=%7B%7D) | https://ark.cn-beijing.volces.com/api/v3 |
| Silicon Flow | [Click to retrieve](https://cloud.siliconflow.cn/account/ak) | https://api.siliconflow.cn/v1 |
| DMXAPI | [Click to retrieve](https://www.dmxapi.cn/token) | https://www.dmxapi.cn/v1 |
### 1. Obtaining a Large Language Model API Key
### 2. Obtaining minerU Tokens (Online Parsing)
The translation function relies on a large language model, and you need to obtain the `base_url`, `api_key`, and `model_id` from the corresponding AI platform.
When selecting `mineru` as the document parsing engine (`convert_engine="mineru"`), you need to apply for a free token.
> Recommended models: Volcano Engine's `doubao-seed-1-6-250615`, `doubao-seed-1-6-flash-250715`, Zhipu's `glm-4-flash`, Alibaba Cloud's `qwen-plus`, `qwen-turbo`, DeepSeek's `deepseek-chat`, etc.
| Platform Name | Method to Obtain API Key | baseurl |
|------------|-----------------------------------------------------------------------------------|----------------------------------------------------------|
| ollama | | http://127.0.0.1:11434/v1 |
| lm studio | | http://127.0.0.1:1234/v1 |
| openrouter | [Click to Obtain](https://openrouter.ai/settings/keys) | https://openrouter.ai/api/v1 |
| openai | [Click to Obtain](https://platform.openai.com/api-keys) | https://api.openai.com/v1/ |
| gemini | [Click to Obtain](https://aistudio.google.com/u/0/apikey) | https://generativelanguage.googleapis.com/v1beta/openai/ |
| deepseek | [Click to Obtain](https://platform.deepseek.com/api_keys) | https://api.deepseek.com/v1 |
| 智譜ai | [Click to Obtain](https://open.bigmodel.cn/usercenter/apikeys) | https://open.bigmodel.cn/api/paas/v4 |
| 騰訊混元 | [Click to Obtain](https://console.cloud.tencent.com/hunyuan/api-key) | https://api.hunyuan.cloud.tencent.com/v1 |
| 阿里云百煉 | [Click to Obtain](https://bailian.console.aliyun.com/?tab=model#/api-key) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| 火山引擎 | [Click to Obtain](https://console.volcengine.com/ark/region:ark+cn-beijing/apiKey?apikey=%7B%7D) | https://ark.cn-beijing.volces.com/api/v3 |
| 硅基流動 | [Click to Obtain](https://cloud.siliconflow.cn/account/ak) | https://api.siliconflow.cn/v1 |
| DMXAPI | [Click to Obtain](https://www.dmxapi.cn/token) | https://www.dmxapi.cn/v1 |
### 2. Obtaining minerU Token (Online Parsing)
If you select `mineru` as the document parsing engine (`convert_engine="mineru"`), you need to apply for a free Token.
1. Visit the [minerU official website](https://mineru.net/apiManage/docs), register, and apply for the API.
2. Create a new API token in the [API Token Management page](https://mineru.net/apiManage/token).
2. Create a new API Token on the [API Token management page](https://mineru.net/apiManage/token).
> **Note**: minerU tokens are valid for 14 days. If expired, recreate them.
> **Note**: The minerU Token is valid for 14 days. If it expires, please recreate it.
### 3. Configuring the docling Engine (Local Parsing)
When selecting `docling` as the document parsing engine (`convert_engine="docling"`), the required models will be
downloaded from Hugging Face upon first use.
If you select `docling` as the document parsing engine (`convert_engine="docling"`), the required models will be downloaded from Hugging Face during the first use.
**Solutions for Network Issues:**
1. **Setting Up Hugging Face Mirror (Recommended)**:
1. **Setting up a Hugging Face Mirror (Recommended)**:
* **Method A (Environment Variable)**: Set the system environment variable `HF_ENDPOINT` and restart the IDE or
terminal.
* **Method A (Environment Variable)**: Set the system environment variable `HF_ENDPOINT` and restart your IDE or terminal.
```
HF_ENDPOINT=https://hf-mirror.com
```
* **Method B (In-Code Configuration)**: Add the following code at the beginning of your Python script.
* **Method B (Setting in Code)**: Add the following code at the beginning of your Python script.
```python
import os
@@ -453,12 +462,16 @@ import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
```
2. **Offline Usage (Pre-Downloading Model Packages)**:
2. **Offline Use (Download Model Packages in Advance)**:
* Download `docling_artifact.zip` from [GitHub Releases](https://github.com/xunbu/docutranslate/releases).
* Extract and place it in the project directory.
* Extract it to your project directory.
* Specify the model path in the configuration:
```python
from docutranslate.converter.x2md.converter_docling import ConverterDoclingConfig
@@ -471,39 +484,33 @@ converter_config = ConverterDoclingConfig(
## FAQ
**Q: What should I do if port 8010 is already in use?**
**Q: What should I do if port 8010 is occupied?**
A: Specify a new port using the `-p` parameter or set the `DOCUTRANSLATE_PORT` environment variable.
**Q: Is scanned document translation supported?**
A: Yes, it is supported. Use the `mineru` parsing engine, which features powerful OCR capabilities.
**Q: Is translation of scanned documents supported?**
A: Yes, it is supported. Please use the `mineru` parsing engine, which is equipped with powerful OCR capabilities.
**Q: Why is it slow during the first use?**
A: When using the `docling` engine, the model needs to be downloaded from Hugging Face during the first run. Refer to
the "Network Issue Solutions" section above to speed up this process.
**Q: Why is it slow on first use?**
A: When using the `docling` engine, the model needs to be downloaded from Hugging Face during the first run. To speed up this process, refer to the "Solutions for Network Issues" section above.
**Q: How can I use it in an intranet (offline) environment?**
A: It is entirely possible. You need to meet the following two conditions:
**Q: How can it be used in an intranet (offline) environment?**
A: It is completely possible. The following two conditions need to be met:
1. **Local Parsing Engine**: Use the `docling` engine and follow the "Offline Usage" steps above to download the model
package in advance.
2. **Local LLM**: Deploy a local language model using tools like [Ollama](https://ollama.com/)
or [LM Studio](https://lmstudio.ai/), then input the local model's `base_url` in `TranslatorConfig`.
1. **Local Parsing Engine**: Use the `docling` engine and download the model package in advance according to the "Offline Use" guide above.
2. **Local LLM**: Deploy a language model locally using tools such as [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/), and enter the `base_url` of the local model in `TranslatorConfig`.
**Q: How does the caching mechanism work?**
A: `MarkdownBasedWorkflow` automatically caches the results of document parsing (conversion from files to Markdown),
saving time and resources. By default, the cache is stored in memory, recording the last 10 parsing operations. You can
adjust the cache size using the `DOCUTRANSLATE_CACHE_NUM` environment variable.
**Q: How does the caching mechanism work?**
A: `MarkdownBasedWorkflow` automatically caches the results of document parsing (conversion from files to Markdown) to avoid wasting time and resources on repeated parsing. The cache is stored in memory by default and records the most recent 10 parsing operations. The number of cached items can be changed via the `DOCUTRANSLATE_CACHE_NUM` environment variable.
**Q: How can I use the software via a proxy?**
A: The software does not use a proxy by default. You can enable proxy usage by setting the `DOCUTRANSLATE_USE_PROXY`
environment variable to `true`.
**Q: How can I use the software via a proxy?**
A: The software does not use a proxy by default. Set the `DOCUTRANSLATE_USE_PROXY` environment variable to `true` to enable communication via a proxy.
## Star History
<a href="https://www.star-history.com/#xunbu/docutranslate&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date" />
</picture>
<a href="https://www.star-history.com/#xunbu/docutranslate&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=xunbu/docutranslate&type=Date" />
</picture>
</a>