Update README.md

This commit is contained in:
ai-modelscope
2025-06-24 00:11:20 +08:00
parent 170477de24
commit 6fbed61b08

View File

@ -37,36 +37,37 @@ This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moon
## 2. Performance ## 2. Performance
Comparison with efficient models and two previous versions of Kimi-VL: Comparison with efficient models and two previous versions of Kimi-VL (*Results of GPT-4o is for reference here, and shown in <i>italics</i>):
<div align="center"> <div align="center">
| Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 | | Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 |
|----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------| |----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------|
| **General Multimodal** | | | | | | | | **General Multimodal** | | | | | | |
| MMBench-EN-v1.1 (Acc) | 83.1 | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** | | MMBench-EN-v1.1 (Acc) | *83.1* | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** |
| RealWorldQA (Acc) | 75.4 | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** | | RealWorldQA (Acc) | *75.4* | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** |
| OCRBench (Acc) | 815 | 864 | 702 | 864 | 864 | **869** | | OCRBench (Acc) | *815* | 864 | 702 | 864 | 864 | **869** |
| MMStar (Acc) | 64.7 | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** | | MMStar (Acc) | *64.7* | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** |
| MMVet (Acc) | 69.1 | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** | | MMVet (Acc) | *69.1* | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** |
| **Reasoning** | | | | | | | | **Reasoning** | | | | | | |
| MMMU (val, Pass@1) | 69.1 | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** | | MMMU (val, Pass@1) | *69.1* | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** |
| MMMU-Pro (Pass@1) | 51.7 | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** | | MMMU-Pro (Pass@1) | *51.7* | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** |
| **Math** | | | | | | | | **Math** | | | | | | |
| MATH-Vision (Pass@1) | 30.4 | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** | | MATH-Vision (Pass@1) | *30.4* | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** |
| MathVista_MINI (Pass@1) | 63.8 | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** | | MathVista_MINI (Pass@1) | *63.8* | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** |
| **Video** | | | | | | | | **Video** | | | | | | |
| VideoMMMU (Pass@1) | 61.2 | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** | | VideoMMMU (Pass@1) | *61.2* | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** |
| MMVU (Pass@1) | 67.4 | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** | | MMVU (Pass@1) | *67.4* | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** |
| Video-MME (w/ sub.) | 77.2 | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 | | Video-MME (w/ sub.) | *77.2* | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 |
| **Agent Grounding** | | | | | | | | **Agent Grounding** | | | | | | |
| ScreenSpot-Pro (Acc) | 0.8 | 29.0 | — | 35.4 | — | **52.8** | | ScreenSpot-Pro (Acc) | *0.8* | 29.0 | — | 35.4 | — | **52.8** |
| ScreenSpot-V2 (Acc) | 18.1 | 84.2 | — | **92.8** | — | 91.4 | | ScreenSpot-V2 (Acc) | *18.1* | 84.2 | — | **92.8** | — | 91.4 |
| OSWorld-G (Acc) | - | 31.5 | — | 41.6 | — | **52.5** | | OSWorld-G (Acc) | - | *31.5* | — | 41.6 | — | **52.5** |
| **Long Document** | | | | | | | | **Long Document** | | | | | | |
| MMLongBench-DOC (Acc) | 42.8 | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** | | MMLongBench-DOC (Acc) | *42.8* | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
</div> </div>
Comparison with 30B-70B open-source models: Comparison with 30B-70B open-source models:
<div align="center"> <div align="center">