Skip to content

Gemini 2.5 Flash-Lite

BIRD Mini-Dev benchmark results for Gemini 2.5 Flash-Lite via Google Cloud Vertex AI.

Back to Overall Results


Summary

Provider Google Cloud Vertex AI
Model gemini-2.5-flash-lite
Overall EX Accuracy 39.4%
Overall Soft F1 39.0%
Error Rate 41.8% (209 / 500)
Avg Latency 7.2s per question
Total Benchmark Time 60.4 minutes
Rank #6 overall (last)

Detailed Scores

Metric Overall Simple (148) Moderate (250) Challenging (102)
Execution Accuracy (EX) 39.4% 56.1% 33.2% 30.4%
Soft F1 39.0% 55.8% 33.4% 28.6%

Gemini 2.5 Flash-Lite — EX vs Soft F1 Breakdown

Analysis

Strengths

  • Reasonable on simple questions at 56.1% EX — comparable to GPT-5.4 Nano (53.4%)
  • Lowest-cost Gemini model for experimentation and prototyping

Weaknesses

  • Extremely high error rate at 41.8% — nearly half of all queries fail to produce valid SQL
  • Soft F1 is lower than EX (-0.4 points) — unlike all other models, partial credit doesn't help because the model frequently generates completely wrong or broken SQL
  • Slowest budget model at 7.2s — nearly 2x slower than GPT-5.4 Nano (4.1s) while scoring lower
  • No advantage over GPT-5.4 Nano — similar accuracy but higher error rate, higher latency, and no cost benefit

When to Use

Gemini 2.5 Flash-Lite is not recommended for text-to-SQL workloads. The 41.8% error rate makes it unreliable for any production or development scenario.

Potential use cases are limited to:

  • Rough experimentation with Vertex AI pricing tiers
  • Workloads where text-to-SQL is not the primary task
  • Any text-to-SQL application (production or development)
  • Batch processing (high error rate wastes compute)
  • Interactive applications (7.2s latency with no accuracy payoff)

Comparison with Peers

vs Model EX Difference Latency Ratio
vs GPT-5.4 Nano -0.6 points 1.8x slower
vs GPT-5.4 Mini -13.8 points 2.0x slower
vs Gemini 2.5 Flash -21.2 points 1.1x slower

Back to Overall Results