增加epub、html工作流的示例说明
This commit is contained in:
153
README.md
153
README.md
@@ -20,18 +20,26 @@
|
|||||||
A lightweight local file translation tool based on Large Language Models
|
A lightweight local file translation tool based on Large Language Models
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
- ✅ **Multiple Format Support**: Translates various files including `pdf`, `docx`, `xlsx`, `md`, `txt`, `json`, `epub`, `srt`, `ass`, and more.
|
- ✅ **Multiple Format Support**: Translates various files including `pdf`, `docx`, `xlsx`, `md`, `txt`, `json`, `epub`,
|
||||||
|
`srt`, `ass`, and more.
|
||||||
- ✅ **Automatic Glossary Generation**: Supports automatic generation of glossaries for term alignment.
|
- ✅ **Automatic Glossary Generation**: Supports automatic generation of glossaries for term alignment.
|
||||||
- ✅ **PDF Table, Formula, and Code Recognition**: Recognizes and translates tables, formulas, and code often found in academic papers, powered by `docling` and `mineru` PDF parsing engines.
|
- ✅ **PDF Table, Formula, and Code Recognition**: Recognizes and translates tables, formulas, and code often found in
|
||||||
- ✅ **JSON Translation**: Supports specifying values to be translated in JSON using JSON paths (following `jsonpath-ng` syntax).
|
academic papers, powered by `docling` and `mineru` PDF parsing engines.
|
||||||
- ✅ **Word/Excel Format Preservation**: Translates `docx` and `xlsx` files while preserving their original formatting (does not yet support `doc` or `xls` files).
|
- ✅ **JSON Translation**: Supports specifying values to be translated in JSON using JSON paths (following `jsonpath-ng`
|
||||||
- ✅ **Multi-AI Platform Support**: Compatible with most AI platforms, enabling high-performance, concurrent AI translation with custom prompts.
|
syntax).
|
||||||
- ✅ **Asynchronous Support**: Designed for high-performance scenarios with full asynchronous support, offering service interfaces for parallel tasks.
|
- ✅ **Word/Excel Format Preservation**: Translates `docx` and `xlsx` files while preserving their original formatting (
|
||||||
|
does not yet support `doc` or `xls` files).
|
||||||
|
- ✅ **Multi-AI Platform Support**: Compatible with most AI platforms, enabling high-performance, concurrent AI
|
||||||
|
translation with custom prompts.
|
||||||
|
- ✅ **Asynchronous Support**: Designed for high-performance scenarios with full asynchronous support, offering service
|
||||||
|
interfaces for parallel tasks.
|
||||||
- ✅ **LAN and Multi-user Support**: Can be used by multiple people simultaneously on a local area network.
|
- ✅ **LAN and Multi-user Support**: Can be used by multiple people simultaneously on a local area network.
|
||||||
- ✅ **Interactive Web Interface**: Provides an out-of-the-box Web UI and RESTful API for easy integration and use.
|
- ✅ **Interactive Web Interface**: Provides an out-of-the-box Web UI and RESTful API for easy integration and use.
|
||||||
- ✅ **Small, Multi-platform Standalone Packages**: Windows and Mac standalone packages under 40MB (for versions not using the `docling` local PDF parser).
|
- ✅ **Small, Multi-platform Standalone Packages**: Windows and Mac standalone packages under 40MB (for versions not
|
||||||
|
using the `docling` local PDF parser).
|
||||||
|
|
||||||
> When translating `pdf` files, they are first converted to Markdown, which will **cause the original layout to be lost**. Users with strict layout requirements should take note.
|
> When translating `pdf` files, they are first converted to Markdown, which will **cause the original layout to be lost
|
||||||
|
**. Users with strict layout requirements should take note.
|
||||||
|
|
||||||
> QQ Discussion Group: 1047781902
|
> QQ Discussion Group: 1047781902
|
||||||
|
|
||||||
@@ -46,10 +54,14 @@
|
|||||||
|
|
||||||
## All-in-One Packages
|
## All-in-One Packages
|
||||||
|
|
||||||
For users who want to get started quickly, we provide all-in-one packages on [GitHub Releases](https://github.com/xunbu/docutranslate/releases). Simply download, unzip, and enter your AI platform API Key to begin.
|
For users who want to get started quickly, we provide all-in-one packages
|
||||||
|
on [GitHub Releases](https://github.com/xunbu/docutranslate/releases). Simply download, unzip, and enter your AI
|
||||||
|
platform API Key to begin.
|
||||||
|
|
||||||
- **DocuTranslate**: Standard version, uses the online `minerU` engine to parse PDF documents. Choose this version if you don't need local PDF parsing (recommended).
|
- **DocuTranslate**: Standard version, uses the online `minerU` engine to parse PDF documents. Choose this version if
|
||||||
- **DocuTranslate_full**: Full version, includes the built-in `docling` local PDF parsing engine. Choose this version if you need local PDF parsing.
|
you don't need local PDF parsing (recommended).
|
||||||
|
- **DocuTranslate_full**: Full version, includes the built-in `docling` local PDF parsing engine. Choose this version if
|
||||||
|
you need local PDF parsing.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -90,12 +102,16 @@ uv sync
|
|||||||
|
|
||||||
## Core Concept: Workflow
|
## Core Concept: Workflow
|
||||||
|
|
||||||
The core of the new DocuTranslate is the **Workflow**. Each workflow is a complete, end-to-end translation pipeline designed for a specific file type. Instead of interacting with a single large class, you select and configure a workflow based on your file type.
|
The core of the new DocuTranslate is the **Workflow**. Each workflow is a complete, end-to-end translation pipeline
|
||||||
|
designed for a specific file type. Instead of interacting with a single large class, you select and configure a workflow
|
||||||
|
based on your file type.
|
||||||
|
|
||||||
**The basic usage flow is as follows:**
|
**The basic usage flow is as follows:**
|
||||||
|
|
||||||
1. **Select a Workflow**: Choose a workflow based on your input file type (e.g., PDF/Word or TXT), such as `MarkdownBasedWorkflow` or `TXTWorkflow`.
|
1. **Select a Workflow**: Choose a workflow based on your input file type (e.g., PDF/Word or TXT), such as
|
||||||
2. **Build Configuration**: Create the corresponding configuration object for the selected workflow (e.g., `MarkdownBasedWorkflowConfig`). This object contains all necessary sub-configurations, such as:
|
`MarkdownBasedWorkflow` or `TXTWorkflow`.
|
||||||
|
2. **Build Configuration**: Create the corresponding configuration object for the selected workflow (e.g.,
|
||||||
|
`MarkdownBasedWorkflowConfig`). This object contains all necessary sub-configurations, such as:
|
||||||
* **Converter Config**: Defines how to convert the original file (like a PDF) to Markdown.
|
* **Converter Config**: Defines how to convert the original file (like a PDF) to Markdown.
|
||||||
* **Translator Config**: Defines which LLM, API-Key, target language, etc., to use.
|
* **Translator Config**: Defines which LLM, API-Key, target language, etc., to use.
|
||||||
* **Exporter Config**: Defines specific options for the output format (like HTML).
|
* **Exporter Config**: Defines specific options for the output format (like HTML).
|
||||||
@@ -106,7 +122,7 @@ The core of the new DocuTranslate is the **Workflow**. Each workflow is a comple
|
|||||||
## Available Workflows
|
## Available Workflows
|
||||||
|
|
||||||
| Workflow | Use Case | Input Formats | Output Formats | Core Config Class |
|
| Workflow | Use Case | Input Formats | Output Formats | Core Config Class |
|
||||||
|:----------------------------|:----------------------------------------------------------------|:---------------------------------------------|:---------------------------|:------------------------------|
|
|:----------------------------|:-------------------------------------------------------------------------------------------------------|:---------------------------------------------|:-----------------------|:------------------------------|
|
||||||
| **`MarkdownBasedWorkflow`** | Processes rich text documents like PDF, Word, images. Flow: `File -> Markdown -> Translate -> Export`. | `.pdf`, `.docx`, `.md`, `.png`, `.jpg`, etc. | `.md`, `.zip`, `.html` | `MarkdownBasedWorkflowConfig` |
|
| **`MarkdownBasedWorkflow`** | Processes rich text documents like PDF, Word, images. Flow: `File -> Markdown -> Translate -> Export`. | `.pdf`, `.docx`, `.md`, `.png`, `.jpg`, etc. | `.md`, `.zip`, `.html` | `MarkdownBasedWorkflowConfig` |
|
||||||
| **`TXTWorkflow`** | Processes plain text documents. Flow: `txt -> Translate -> Export`. | `.txt` and other plain text formats | `.txt`, `.html` | `TXTWorkflowConfig` |
|
| **`TXTWorkflow`** | Processes plain text documents. Flow: `txt -> Translate -> Export`. | `.txt` and other plain text formats | `.txt`, `.html` | `TXTWorkflowConfig` |
|
||||||
| **`JsonWorkflow`** | Processes JSON files. Flow: `json -> Translate -> Export`. | `.json` | `.json`, `.html` | `JsonWorkflowConfig` |
|
| **`JsonWorkflow`** | Processes JSON files. Flow: `json -> Translate -> Export`. | `.json` | `.json`, `.html` | `JsonWorkflowConfig` |
|
||||||
@@ -136,14 +152,16 @@ export DOCUTRANSLATE_PORT=8011
|
|||||||
docutranslate -i
|
docutranslate -i
|
||||||
```
|
```
|
||||||
|
|
||||||
- **Interactive Interface**: After starting the service, visit `http://127.0.0.1:8010` (or your specified port) in your browser.
|
- **Interactive Interface**: After starting the service, visit `http://127.0.0.1:8010` (or your specified port) in your
|
||||||
|
browser.
|
||||||
- **API Documentation**: The complete API documentation (Swagger UI) is available at `http://127.0.0.1:8010/docs`.
|
- **API Documentation**: The complete API documentation (Swagger UI) is available at `http://127.0.0.1:8010/docs`.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### Example 1: Translate a PDF file (using `MarkdownBasedWorkflow`)
|
### Example 1: Translate a PDF file (using `MarkdownBasedWorkflow`)
|
||||||
|
|
||||||
This is the most common use case. We will use the `minerU` engine to convert the PDF to Markdown and then use an LLM for translation. This example uses the asynchronous method.
|
This is the most common use case. We will use the `minerU` engine to convert the PDF to Markdown and then use an LLM for
|
||||||
|
translation. This example uses the asynchronous method.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import asyncio
|
import asyncio
|
||||||
@@ -210,7 +228,8 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
### Example 2: Translate a TXT file (using `TXTWorkflow`)
|
### Example 2: Translate a TXT file (using `TXTWorkflow`)
|
||||||
|
|
||||||
For plain text files, the process is simpler as it doesn't require a document parsing (conversion) step. This example uses the asynchronous method.
|
For plain text files, the process is simpler as it doesn't require a document parsing (conversion) step. This example
|
||||||
|
uses the asynchronous method.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import asyncio
|
import asyncio
|
||||||
@@ -257,7 +276,8 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
### Example 3: Translate a JSON file (using `JsonWorkflow`)
|
### Example 3: Translate a JSON file (using `JsonWorkflow`)
|
||||||
|
|
||||||
This example uses the asynchronous method. The `json_paths` item in `JsonTranslatorConfig` needs to specify the JSON paths to be translated (conforming to the `jsonpath-ng` syntax). Only values matching these paths will be translated.
|
This example uses the asynchronous method. The `json_paths` item in `JsonTranslatorConfig` needs to specify the JSON
|
||||||
|
paths to be translated (conforming to the `jsonpath-ng` syntax). Only values matching these paths will be translated.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import asyncio
|
import asyncio
|
||||||
@@ -404,18 +424,73 @@ if __name__ == "__main__":
|
|||||||
asyncio.run(main())
|
asyncio.run(main())
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Example 5: Configuration Items for Other Workflows (Using `HtmlWorkflow`, `EpubWorkflow`)
|
||||||
|
|
||||||
|
Here is an example using asynchronous mode.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# HtmlWorkflow
|
||||||
|
from docutranslate.translator.ai_translator.html_translator import HtmlTranslatorConfig
|
||||||
|
from docutranslate.workflow.html_workflow import HtmlWorkflowConfig, HtmlWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def html():
|
||||||
|
# 1. Create translator configuration
|
||||||
|
translator_config = HtmlTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="Chinese",
|
||||||
|
insert_mode="replace", # Options: "replace", "append", "prepend"
|
||||||
|
separator="\n", # Separator used for "append" and "prepend" modes
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. Create main workflow configuration
|
||||||
|
workflow_config = HtmlWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
)
|
||||||
|
workflow_html = HtmlWorkflow(config=workflow_config)
|
||||||
|
|
||||||
|
|
||||||
|
# EpubWorkflow
|
||||||
|
from docutranslate.exporter.epub.epub2html_exporter import Epub2HTMLExporterConfig
|
||||||
|
from docutranslate.translator.ai_translator.epub_translator import EpubTranslatorConfig
|
||||||
|
from docutranslate.workflow.epub_workflow import EpubWorkflowConfig, EpubWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def epub():
|
||||||
|
# 1. Create translator configuration
|
||||||
|
translator_config = EpubTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="Chinese",
|
||||||
|
insert_mode="replace", # Options: "replace", "append", "prepend"
|
||||||
|
separator="\n", # Separator used for "append" and "prepend" modes
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. Create main workflow configuration
|
||||||
|
workflow_config = EpubWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
html_exporter_config=Epub2HTMLExporterConfig(cdn=True),
|
||||||
|
)
|
||||||
|
workflow_epub = EpubWorkflow(config=workflow_config)
|
||||||
|
```
|
||||||
|
|
||||||
## Prerequisites and Configuration Details
|
## Prerequisites and Configuration Details
|
||||||
|
|
||||||
### 1. Get a Large Model API Key
|
### 1. Get a Large Model API Key
|
||||||
|
|
||||||
The translation feature relies on large language models. You need to obtain a `base_url`, `api_key`, and `model_id` from the respective AI platform.
|
The translation feature relies on large language models. You need to obtain a `base_url`, `api_key`, and `model_id` from
|
||||||
|
the respective AI platform.
|
||||||
|
|
||||||
> Recommended models: Volcengine's `doubao-seed-1-6-flash` and `doubao-seed-1-6` series, Zhipu's `glm-4-flash`, Alibaba Cloud's `qwen-plus` and `qwen-flash`, Deepseek's `deepseek-chat`, etc.
|
> Recommended models: Volcengine's `doubao-seed-1-6-flash` and `doubao-seed-1-6` series, Zhipu's `glm-4-flash`, Alibaba
|
||||||
|
> Cloud's `qwen-plus` and `qwen-flash`, Deepseek's `deepseek-chat`, etc.
|
||||||
|
|
||||||
> [302.AI](https://share.302.ai/BgRLAe)👈 Register through this link to enjoy a $1 free credit
|
> [302.AI](https://share.302.ai/BgRLAe)👈 Register through this link to enjoy a $1 free credit
|
||||||
|
|
||||||
| Platform Name | Get API Key | Base URL |
|
| Platform Name | Get API Key | Base URL |
|
||||||
|:--------------------|:-----------------------------------------------------------------------------------------|:---------------------------------------------------------|
|
|:------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------|
|
||||||
| ollama | | `http://127.0.0.1:11434/v1` |
|
| ollama | | `http://127.0.0.1:11434/v1` |
|
||||||
| lm studio | | `http://127.0.0.1:1234/v1` |
|
| lm studio | | `http://127.0.0.1:1234/v1` |
|
||||||
| 302.AI | [Click to get](https://share.302.ai/BgRLAe) | `https://api.302.ai/v1` |
|
| 302.AI | [Click to get](https://share.302.ai/BgRLAe) | `https://api.302.ai/v1` |
|
||||||
@@ -444,27 +519,35 @@ If you choose `mineru` as your document parsing engine (`convert_engine="mineru"
|
|||||||
|
|
||||||
#### 2.2. docling Engine Configuration (Local PDF parsing)
|
#### 2.2. docling Engine Configuration (Local PDF parsing)
|
||||||
|
|
||||||
If you choose `docling` as your document parsing engine (`convert_engine="docling"`), it will download the required models from Hugging Face upon first use.
|
If you choose `docling` as your document parsing engine (`convert_engine="docling"`), it will download the required
|
||||||
|
models from Hugging Face upon first use.
|
||||||
|
|
||||||
> A better option is to download `docling_artifact.zip` from [GitHub Releases](https://github.com/xunbu/docutranslate/releases) and extract it to your working directory.
|
> A better option is to download `docling_artifact.zip`
|
||||||
|
> from [GitHub Releases](https://github.com/xunbu/docutranslate/releases) and extract it to your working directory.
|
||||||
|
|
||||||
**Solutions for network issues when downloading `docling` models:**
|
**Solutions for network issues when downloading `docling` models:**
|
||||||
|
|
||||||
1. **Set a Hugging Face mirror (Recommended)**:
|
1. **Set a Hugging Face mirror (Recommended)**:
|
||||||
* **Method A (Environment Variable)**: Set the system environment variable `HF_ENDPOINT` and restart your IDE or terminal.
|
* **Method A (Environment Variable)**: Set the system environment variable `HF_ENDPOINT` and restart your IDE or
|
||||||
|
terminal.
|
||||||
```
|
```
|
||||||
HF_ENDPOINT=https://hf-mirror.com
|
HF_ENDPOINT=https://hf-mirror.com
|
||||||
```
|
```
|
||||||
|
|
||||||
* **Method B (Set in code)**: Add the following code at the beginning of your Python script.
|
* **Method B (Set in code)**: Add the following code at the beginning of your Python script.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import os
|
import os
|
||||||
|
|
||||||
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
|
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Offline Usage (Download the model package in advance)**:
|
2. **Offline Usage (Download the model package in advance)**:
|
||||||
* Download `docling_artifact.zip` from [GitHub Releases](https://github.com/xunbu/docutranslate/releases).
|
* Download `docling_artifact.zip` from [GitHub Releases](https://github.com/xunbu/docutranslate/releases).
|
||||||
* Extract it into your project directory.
|
* Extract it into your project directory.
|
||||||
|
|
||||||
* Specify the model path in your configuration (if the model is not in the same directory as the script):
|
* Specify the model path in your configuration (if the model is not in the same directory as the script):
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from docutranslate.converter.x2md.converter_docling import ConverterDoclingConfig
|
from docutranslate.converter.x2md.converter_docling import ConverterDoclingConfig
|
||||||
|
|
||||||
@@ -478,7 +561,8 @@ converter_config = ConverterDoclingConfig(
|
|||||||
## FAQ
|
## FAQ
|
||||||
|
|
||||||
**Q: Why is the translated text still in the original language?**
|
**Q: Why is the translated text still in the original language?**
|
||||||
A: Check the logs for errors. It's usually due to an overdue payment on the AI platform or network issues (check if you need to enable the system proxy).
|
A: Check the logs for errors. It's usually due to an overdue payment on the AI platform or network issues (check if you
|
||||||
|
need to enable the system proxy).
|
||||||
|
|
||||||
**Q: Port 8010 is already in use. What should I do?**
|
**Q: Port 8010 is already in use. What should I do?**
|
||||||
A: Use the `-p` parameter to specify a new port, or set the `DOCUTRANSLATE_PORT` environment variable.
|
A: Use the `-p` parameter to specify a new port, or set the `DOCUTRANSLATE_PORT` environment variable.
|
||||||
@@ -487,18 +571,25 @@ A: Use the `-p` parameter to specify a new port, or set the `DOCUTRANSLATE_PORT`
|
|||||||
A: Yes. Please use the `mineru` parsing engine, which has powerful OCR capabilities.
|
A: Yes. Please use the `mineru` parsing engine, which has powerful OCR capabilities.
|
||||||
|
|
||||||
**Q: Why is the first PDF translation very slow?**
|
**Q: Why is the first PDF translation very slow?**
|
||||||
A: If you are using the `docling` engine, it needs to download models from Hugging Face on its first run. Please refer to the "Network Issues Solutions" section above to speed up this process.
|
A: If you are using the `docling` engine, it needs to download models from Hugging Face on its first run. Please refer
|
||||||
|
to the "Network Issues Solutions" section above to speed up this process.
|
||||||
|
|
||||||
**Q: How can I use it in an intranet (offline) environment?**
|
**Q: How can I use it in an intranet (offline) environment?**
|
||||||
A: Absolutely. You need to meet the following conditions:
|
A: Absolutely. You need to meet the following conditions:
|
||||||
1. **Local LLM**: Deploy a language model locally using tools like [Ollama](https://ollama.com/) or [LM Studio](https://lmstudio.ai/), and fill in the local model's `base_url` in `TranslatorConfig`.
|
|
||||||
2. **Local PDF Parsing Engine** (only for parsing PDFs): Use the `docling` engine and download the model package in advance as described in the "Offline Usage" section above.
|
1. **Local LLM**: Deploy a language model locally using tools like [Ollama](https://ollama.com/)
|
||||||
|
or [LM Studio](https://lmstudio.ai/), and fill in the local model's `base_url` in `TranslatorConfig`.
|
||||||
|
2. **Local PDF Parsing Engine** (only for parsing PDFs): Use the `docling` engine and download the model package in
|
||||||
|
advance as described in the "Offline Usage" section above.
|
||||||
|
|
||||||
**Q: How does the PDF parsing cache mechanism work?**
|
**Q: How does the PDF parsing cache mechanism work?**
|
||||||
A: `MarkdownBasedWorkflow` automatically caches the results of document parsing (file-to-Markdown conversion) to avoid repetitive, time-consuming parsing. The cache is stored in memory by default and records the last 10 parses. You can change the cache size using the `DOCUTRANSLATE_CACHE_NUM` environment variable.
|
A: `MarkdownBasedWorkflow` automatically caches the results of document parsing (file-to-Markdown conversion) to avoid
|
||||||
|
repetitive, time-consuming parsing. The cache is stored in memory by default and records the last 10 parses. You can
|
||||||
|
change the cache size using the `DOCUTRANSLATE_CACHE_NUM` environment variable.
|
||||||
|
|
||||||
**Q: How can I make the software use a proxy?**
|
**Q: How can I make the software use a proxy?**
|
||||||
A: By default, the software does not use the system proxy. You can enable it by setting `system_proxy_enable=True` in `TranslatorConfig`.
|
A: By default, the software does not use the system proxy. You can enable it by setting `system_proxy_enable=True` in
|
||||||
|
`TranslatorConfig`.
|
||||||
|
|
||||||
## Star History
|
## Star History
|
||||||
|
|
||||||
|
|||||||
56
README_JP.md
56
README_JP.md
@@ -411,13 +411,67 @@ if __name__ == "__main__":
|
|||||||
asyncio.run(main())
|
asyncio.run(main())
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 例 5: その他のワークフローの設定項目 (`HtmlWorkflow`、`EpubWorkflow` の使用)
|
||||||
|
|
||||||
|
以下は非同期モードの使用例です。
|
||||||
|
|
||||||
|
```python
|
||||||
|
# HtmlWorkflow
|
||||||
|
from docutranslate.translator.ai_translator.html_translator import HtmlTranslatorConfig
|
||||||
|
from docutranslate.workflow.html_workflow import HtmlWorkflowConfig, HtmlWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def html():
|
||||||
|
# 1. 翻訳機の設定を作成
|
||||||
|
translator_config = HtmlTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="中国語",
|
||||||
|
insert_mode="replace", # 選択肢: "replace", "append", "prepend"
|
||||||
|
separator="\n", # "append", "prepend" モードで使用される区切り文字
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. メインワークフローの設定を作成
|
||||||
|
workflow_config = HtmlWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
)
|
||||||
|
workflow_html = HtmlWorkflow(config=workflow_config)
|
||||||
|
|
||||||
|
|
||||||
|
# EpubWorkflow
|
||||||
|
from docutranslate.exporter.epub.epub2html_exporter import Epub2HTMLExporterConfig
|
||||||
|
from docutranslate.translator.ai_translator.epub_translator import EpubTranslatorConfig
|
||||||
|
from docutranslate.workflow.epub_workflow import EpubWorkflowConfig, EpubWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def epub():
|
||||||
|
# 1. 翻訳機の設定を作成
|
||||||
|
translator_config = EpubTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="中国語",
|
||||||
|
insert_mode="replace", # 選択肢: "replace", "append", "prepend"
|
||||||
|
separator="\n", # "append", "prepend" モードで使用される区切り文字
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. メインワークフローの設定を作成
|
||||||
|
workflow_config = EpubWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
html_exporter_config=Epub2HTMLExporterConfig(cdn=True),
|
||||||
|
)
|
||||||
|
workflow_epub = EpubWorkflow(config=workflow_config)
|
||||||
|
```
|
||||||
|
|
||||||
## 前提条件と設定詳細
|
## 前提条件と設定詳細
|
||||||
|
|
||||||
### 1. 大規模モデルAPIキーの取得
|
### 1. 大規模モデルAPIキーの取得
|
||||||
|
|
||||||
翻訳機能は大規模言語モデルに依存しているため、対応するAIプラットフォームから`base_url`、`api_key`、`model_id`を取得する必要があります。
|
翻訳機能は大規模言語モデルに依存しているため、対応するAIプラットフォームから`base_url`、`api_key`、`model_id`を取得する必要があります。
|
||||||
|
|
||||||
> 推奨モデル:火山引擎の`doubao-seed-1-6-flash`、`doubao-seed-1-6`シリーズ、智譜の`glm-4-flash`、阿里雲の`qwen-plus`、 `qwen-flash`、deepseekの`deepseek-chat`など。
|
> 推奨モデル:火山引擎の`doubao-seed-1-6-flash`、`doubao-seed-1-6`シリーズ、智譜の`glm-4-flash`、阿里雲の`qwen-plus`、
|
||||||
|
`qwen-flash`、deepseekの`deepseek-chat`など。
|
||||||
|
|
||||||
> [302.AI](https://share.302.ai/BgRLAe)👈 このリンクから登録で1ドル分の無料クレジットを提供
|
> [302.AI](https://share.302.ai/BgRLAe)👈 このリンクから登録で1ドル分の無料クレジットを提供
|
||||||
|
|
||||||
|
|||||||
56
README_ZH.md
56
README_ZH.md
@@ -406,13 +406,67 @@ if __name__ == "__main__":
|
|||||||
asyncio.run(main())
|
asyncio.run(main())
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 示例 5: 其它workflow的配置项(使用 `HtmlWorkflow`、`EpubWorkflow`)
|
||||||
|
|
||||||
|
这里以异步方式为例。
|
||||||
|
|
||||||
|
```python
|
||||||
|
# HtmlWorkflow
|
||||||
|
from docutranslate.translator.ai_translator.html_translator import HtmlTranslatorConfig
|
||||||
|
from docutranslate.workflow.html_workflow import HtmlWorkflowConfig, HtmlWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def html():
|
||||||
|
# 1. 构建翻译器配置
|
||||||
|
translator_config = HtmlTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="中文",
|
||||||
|
insert_mode="replace", # 备选项 "replace", "append", "prepend"
|
||||||
|
separator="\n", # "append", "prepend"模式时使用的分隔符
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. 构建主工作流配置
|
||||||
|
workflow_config = HtmlWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
)
|
||||||
|
workflow_html = HtmlWorkflow(config=workflow_config)
|
||||||
|
|
||||||
|
|
||||||
|
# EpubWorkflow
|
||||||
|
from docutranslate.exporter.epub.epub2html_exporter import Epub2HTMLExporterConfig
|
||||||
|
from docutranslate.translator.ai_translator.epub_translator import EpubTranslatorConfig
|
||||||
|
from docutranslate.workflow.epub_workflow import EpubWorkflowConfig, EpubWorkflow
|
||||||
|
|
||||||
|
|
||||||
|
async def epub():
|
||||||
|
# 1. 构建翻译器配置
|
||||||
|
translator_config = EpubTranslatorConfig(
|
||||||
|
base_url="https://api.openai.com/v1/",
|
||||||
|
api_key="YOUR_OPENAI_API_KEY",
|
||||||
|
model_id="gpt-4o",
|
||||||
|
to_lang="中文",
|
||||||
|
insert_mode="replace", # 备选项 "replace", "append", "prepend"
|
||||||
|
separator="\n", # "append", "prepend"模式时使用的分隔符
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. 构建主工作流配置
|
||||||
|
workflow_config = EpubWorkflowConfig(
|
||||||
|
translator_config=translator_config,
|
||||||
|
html_exporter_config=Epub2HTMLExporterConfig(cdn=True),
|
||||||
|
)
|
||||||
|
workflow_epub = EpubWorkflow(config=workflow_config)
|
||||||
|
```
|
||||||
|
|
||||||
## 前置条件与配置详解
|
## 前置条件与配置详解
|
||||||
|
|
||||||
### 1. 获取大模型 API Key
|
### 1. 获取大模型 API Key
|
||||||
|
|
||||||
翻译功能依赖于大型语言模型,您需要从相应的 AI 平台获取 `base_url`, `api_key` 和 `model_id`。
|
翻译功能依赖于大型语言模型,您需要从相应的 AI 平台获取 `base_url`, `api_key` 和 `model_id`。
|
||||||
|
|
||||||
> 推荐模型:火山引擎的`doubao-seed-1-6-flash`、`doubao-seed-1-6`系列、智谱的`glm-4-flash`,阿里云的 `qwen-plus`、`qwen-flash`,deepseek的`deepseek-chat`等。
|
> 推荐模型:火山引擎的`doubao-seed-1-6-flash`、`doubao-seed-1-6`系列、智谱的`glm-4-flash`,阿里云的 `qwen-plus`、`qwen-flash`
|
||||||
|
> ,deepseek的`deepseek-chat`等。
|
||||||
|
|
||||||
> [302.AI](https://share.302.ai/BgRLAe)👈从该链接注册可享1美元免费额度
|
> [302.AI](https://share.302.ai/BgRLAe)👈从该链接注册可享1美元免费额度
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user