FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs

Junzhe Jiang *1, Chang Yang *1, Aixin Cui 2, Sihan Jin 3, Ruiyu Wang 4, Bo Li 1, Xiao Huang 1, Dongning Sun †5, Xinrun Wang †6

1The Hong Kong Polytechnic University 2The Chinese University of Hong Kong 3The Hong Kong University of Science and Technology 4KTH Royal Institute of Technology 5Peng Cheng Laboratory 6Singapore Management University

*Equal contribution | Corresponding author

📋 About

FinMaster Overview

Existing benchmarks are limited by their reliance on static datasets, narrow task scope, and inability to capture the dynamic, multi-faceted nature of real-world financial workflows. To address these issues, we present FinMaster, a holistic benchmark for mastering full-pipeline financial workflows with LLMs.

To the best of our knowledge, FinMaster is the first benchmark that comprehensively covers full-pipeline financial workflows with challenging and realistic tasks.

Dataset Access

The complete dataset is available on Kaggle. You can access and download the FinMaster public dataset from our Kaggle repository. This dataset includes all the necessary data for reproducing our results.

Open in Kaggle

📊 Leaderboard

Model
Rank
Average
Date
Financial Literacy
Accounting
Auditing
Consulting
Open-source
Reasoning
Link

o3-mini

1

0.73

2025-01

1.00

0.35

0.85

0.66

proprietary

reasoning

🔗

Claude-3.7-Sonnet

2

0.72

2025-01

1.00

0.30

0.70

0.80

proprietary

non-reasoning

🔗

DeepSeek-V3-2503

3

0.70

2025-03

0.99

0.21

0.69

0.80

open-source

non-reasoning

🔗

GPT-4.1

4

0.62

2024-07

1.00

0.33

0.41

0.56

proprietary

non-reasoning

🔗

GPT-4.1-mini

5

0.58

2024-07

0.90

0.20

0.29

0.66

proprietary

non-reasoning

🔗

GPT-4o-mini

6

0.46

2024-07

0.89

0.08

0.27

0.37

proprietary

non-reasoning

🔗

GPT-4.1-nano

7

0.40

2024-07

0.85

0.04

0.00

0.42

proprietary

non-reasoning

🔗