📋 About

Existing benchmarks are limited by their reliance on static datasets, narrow task scope, and inability to capture the dynamic, multi-faceted nature of real-world financial workflows. To address these issues, we present FinMaster, a holistic benchmark for mastering full-pipeline financial workflows with LLMs.
To the best of our knowledge, FinMaster is the first benchmark that comprehensively covers full-pipeline financial workflows with challenging and realistic tasks.
Dataset Access
The complete dataset is available on Kaggle. You can access and download the FinMaster public dataset from our Kaggle repository. This dataset includes all the necessary data for reproducing our results.
📊 Leaderboard
Model↕
|
Rank↕
|
Average↕
|
Date↕
|
Financial Literacy↕
|
Accounting↕
|
Auditing↕
|
Consulting↕
|
Open-source↕
|
Reasoning↕
|
Link
|
---|---|---|---|---|---|---|---|---|---|---|
o3-mini |
1 |
0.73 |
2025-01 |
1.00 |
0.35 |
0.85 |
0.66 |
proprietary |
reasoning |
|
Claude-3.7-Sonnet |
2 |
0.72 |
2025-01 |
1.00 |
0.30 |
0.70 |
0.80 |
proprietary |
non-reasoning |
|
DeepSeek-V3-2503 |
3 |
0.70 |
2025-03 |
0.99 |
0.21 |
0.69 |
0.80 |
open-source |
non-reasoning |
|
GPT-4.1 |
4 |
0.62 |
2024-07 |
1.00 |
0.33 |
0.41 |
0.56 |
proprietary |
non-reasoning |
|
GPT-4.1-mini |
5 |
0.58 |
2024-07 |
0.90 |
0.20 |
0.29 |
0.66 |
proprietary |
non-reasoning |
|
GPT-4o-mini |
6 |
0.46 |
2024-07 |
0.89 |
0.08 |
0.27 |
0.37 |
proprietary |
non-reasoning |
|
GPT-4.1-nano |
7 |
0.40 |
2024-07 |
0.85 |
0.04 |
0.00 |
0.42 |
proprietary |
non-reasoning |