Gemini 2.5 Flash¶

BIRD Mini-Dev benchmark results for Gemini 2.5 Flash via Google Cloud Vertex AI.

Summary¶

Metric	Overall	Simple (148)	Moderate (250)	Challenging (102)
Execution Accuracy (EX)	60.6%	76.4%	53.6%	54.9%
Soft F1	62.1%	75.1%	56.8%	55.9%

Gemini 2.5 Flash — EX vs Soft F1 Breakdown

Best on challenging questions at 54.9% EX — highest among all models, edging out even Gemini 2.5 Pro (53.9%)
Excellent reliability with only 12 errors (2.4%), second-lowest error rate
Good balance of speed and accuracy — 3x faster than Gemini 2.5 Pro while only 3.8 points behind
Soft F1 exceeds EX by 1.5 points, showing it frequently produces partially-correct SQL on harder questions

Moderate question gap — drops to 53.6% on moderate difficulty, 7.6 points behind Gemini 2.5 Pro
Not the fastest — GPT-5.4 Mini runs at roughly half the latency (3.6s vs 6.7s)

Gemini 2.5 Flash offers the best accuracy-to-speed ratio among Google models. Ideal for: