mirror of
https://www.modelscope.cn/PaddlePaddle/PaddleOCR-VL.git
synced 2026-04-02 21:42:54 +08:00
update readme
This commit is contained in:
39
README.md
39
README.md
@ -3,7 +3,6 @@ license: apache-2.0
|
|||||||
pipeline_tag: image-text-to-text
|
pipeline_tag: image-text-to-text
|
||||||
tags:
|
tags:
|
||||||
- ERNIE4.5
|
- ERNIE4.5
|
||||||
- PaddleOCR
|
|
||||||
- PaddlePaddle
|
- PaddlePaddle
|
||||||
- image-to-text
|
- image-to-text
|
||||||
- ocr
|
- ocr
|
||||||
@ -17,7 +16,6 @@ language:
|
|||||||
- en
|
- en
|
||||||
- zh
|
- zh
|
||||||
- multilingual
|
- multilingual
|
||||||
library_name: PaddleOCR
|
|
||||||
---
|
---
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
@ -39,7 +37,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
|
|||||||
[](./LICENSE)
|
[](./LICENSE)
|
||||||
|
|
||||||
**🔥 Official Demo**: [Baidu AI Studio](https://aistudio.baidu.com/application/detail/98365) |
|
**🔥 Official Demo**: [Baidu AI Studio](https://aistudio.baidu.com/application/detail/98365) |
|
||||||
**📝 Blog**: [Technical Report](https://ernie.baidu.com/blog/publication/PaddleOCR-VL_Technical_Report.pdf)
|
**📝 arXiv**: [Technical Report](https://arxiv.org/pdf/2510.14528)
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@ -84,14 +82,18 @@ Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick) and [Paddl
|
|||||||
```bash
|
```bash
|
||||||
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
|
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
|
||||||
python -m pip install -U "paddleocr[doc-parser]"
|
python -m pip install -U "paddleocr[doc-parser]"
|
||||||
|
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> For Windows users, please use WSL or a Docker container.
|
||||||
|
|
||||||
|
|
||||||
### Basic Usage
|
### Basic Usage
|
||||||
|
|
||||||
CLI usage:
|
CLI usage:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_ocr_vl_demo.png
|
paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png
|
||||||
```
|
```
|
||||||
|
|
||||||
Python API usage:
|
Python API usage:
|
||||||
@ -100,7 +102,7 @@ Python API usage:
|
|||||||
from paddleocr import PaddleOCRVL
|
from paddleocr import PaddleOCRVL
|
||||||
|
|
||||||
pipeline = PaddleOCRVL()
|
pipeline = PaddleOCRVL()
|
||||||
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_ocr_vl_demo.png")
|
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
|
||||||
for res in output:
|
for res in output:
|
||||||
res.print()
|
res.print()
|
||||||
res.save_to_json(save_path="output")
|
res.save_to_json(save_path="output")
|
||||||
@ -117,23 +119,22 @@ for res in output:
|
|||||||
--gpus all \
|
--gpus all \
|
||||||
--network host \
|
--network host \
|
||||||
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server
|
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server
|
||||||
# You can also use ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server for the SGLang server
|
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Call the PaddleOCR CLI or Python API:
|
2. Call the PaddleOCR CLI or Python API:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
paddleocr doc_parser \
|
paddleocr doc_parser \
|
||||||
-i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_ocr_vl_demo.png \
|
-i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png \
|
||||||
--vl_rec_backend vllm-server \
|
--vl_rec_backend vllm-server \
|
||||||
--vl_rec_server_url http://127.0.0.1:8080
|
--vl_rec_server_url http://127.0.0.1:8080/v1
|
||||||
```
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from paddleocr import PaddleOCRVL
|
from paddleocr import PaddleOCRVL
|
||||||
|
|
||||||
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8080")
|
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8080/v1")
|
||||||
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_ocr_vl_demo.png")
|
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
|
||||||
for res in output:
|
for res in output:
|
||||||
res.print()
|
res.print()
|
||||||
res.save_to_json(save_path="output")
|
res.save_to_json(save_path="output")
|
||||||
@ -246,31 +247,32 @@ The evaluation set is broadly categorized into 11 chart categories, including ba
|
|||||||
### Text
|
### Text
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="./imgs/text_english_arabic.jpg" width="300"/>
|
<img src="./imgs/text_english_arabic.jpg" width="300" style="display: inline-block;"/>
|
||||||
<img src="./imgs/text_handwriting_02.jpg" width="300"/>
|
<img src="./imgs/text_handwriting_02.jpg" width="300" style="display: inline-block;"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
### Table
|
### Table
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="./imgs/table_01.jpg" width="300"/>
|
<img src="./imgs/table_01.jpg" width="300" style="display: inline-block;"/>
|
||||||
<img src="./imgs/table_02.jpg" width="300"/>
|
<img src="./imgs/table_02.jpg" width="300" style="display: inline-block;"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
### Formula
|
### Formula
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="./imgs/formula_EN.jpg" width="300"/>
|
<img src="./imgs/formula_EN.jpg" width="300" style="display: inline-block;"/>
|
||||||
<img src="./imgs/formula_EN.jpg" width="300"/>
|
<img src="./imgs/formula_ZH.jpg" width="300" style="display: inline-block;"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
### Chart
|
### Chart
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="./imgs/chart_01.jpg" width="300"/>
|
<img src="./imgs/chart_01.jpg" width="300" style="display: inline-block;"/>
|
||||||
<img src="./imgs/chart_02.jpg" width="300"/>
|
<img src="./imgs/chart_02.jpg" width="300" style="display: inline-block;"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
@ -291,4 +293,3 @@ If you find PaddleOCR-VL helpful, feel free to give us a star and citation.
|
|||||||
howpublished={\url{https://ernie.baidu.com/blog/publication/PaddleOCR-VL_Technical_Report.pdf}}
|
howpublished={\url{https://ernie.baidu.com/blog/publication/PaddleOCR-VL_Technical_Report.pdf}}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user