diff --git a/README.md b/README.md index bde011c..630977c 100644 --- a/README.md +++ b/README.md @@ -37,36 +37,37 @@ This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moon ## 2. Performance -Comparison with efficient models and two previous versions of Kimi-VL: +Comparison with efficient models and two previous versions of Kimi-VL (*Results of GPT-4o is for reference here, and shown in italics):
| Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 | |----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------| | **General Multimodal** | | | | | | | -| MMBench-EN-v1.1 (Acc) | 83.1 | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** | -| RealWorldQA (Acc) | 75.4 | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** | -| OCRBench (Acc) | 815 | 864 | 702 | 864 | 864 | **869** | -| MMStar (Acc) | 64.7 | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** | -| MMVet (Acc) | 69.1 | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** | +| MMBench-EN-v1.1 (Acc) | *83.1* | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** | +| RealWorldQA (Acc) | *75.4* | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** | +| OCRBench (Acc) | *815* | 864 | 702 | 864 | 864 | **869** | +| MMStar (Acc) | *64.7* | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** | +| MMVet (Acc) | *69.1* | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** | | **Reasoning** | | | | | | | -| MMMU (val, Pass@1) | 69.1 | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** | -| MMMU-Pro (Pass@1) | 51.7 | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** | +| MMMU (val, Pass@1) | *69.1* | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** | +| MMMU-Pro (Pass@1) | *51.7* | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** | | **Math** | | | | | | | -| MATH-Vision (Pass@1) | 30.4 | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** | -| MathVista_MINI (Pass@1) | 63.8 | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** | +| MATH-Vision (Pass@1) | *30.4* | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** | +| MathVista_MINI (Pass@1) | *63.8* | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** | | **Video** | | | | | | | -| VideoMMMU (Pass@1) | 61.2 | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** | -| MMVU (Pass@1) | 67.4 | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** | -| Video-MME (w/ sub.) | 77.2 | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 | +| VideoMMMU (Pass@1) | *61.2* | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** | +| MMVU (Pass@1) | *67.4* | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** | +| Video-MME (w/ sub.) | *77.2* | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 | | **Agent Grounding** | | | | | | | -| ScreenSpot-Pro (Acc) | 0.8 | 29.0 | — | 35.4 | — | **52.8** | -| ScreenSpot-V2 (Acc) | 18.1 | 84.2 | — | **92.8** | — | 91.4 | -| OSWorld-G (Acc) | - | 31.5 | — | 41.6 | — | **52.5** | +| ScreenSpot-Pro (Acc) | *0.8* | 29.0 | — | 35.4 | — | **52.8** | +| ScreenSpot-V2 (Acc) | *18.1* | 84.2 | — | **92.8** | — | 91.4 | +| OSWorld-G (Acc) | - | *31.5* | — | 41.6 | — | **52.5** | | **Long Document** | | | | | | | -| MMLongBench-DOC (Acc) | 42.8 | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** | +| MMLongBench-DOC (Acc) | *42.8* | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
+ Comparison with 30B-70B open-source models: