GPT-5.4 Nano¶

BIRD Mini-Dev benchmark results for GPT-5.4 Nano via OpenAI.

Summary¶

Metric	Overall	Simple (148)	Moderate (250)	Challenging (102)
Execution Accuracy (EX)	40.0%	53.4%	36.0%	30.4%
Soft F1	43.2%	56.3%	40.0%	31.9%

GPT-5.4 Nano — EX vs Soft F1 Breakdown

Fast at 4.1s average latency, second only to GPT-5.4 Mini
Reasonable on simple questions at 53.4% EX — more than half of straightforward queries produce correct results
Lower error rate than Flash-Lite at 6.8% compared to 41.8%, showing better SQL generation reliability

Below 40% on moderate and challenging — 36.0% and 30.4% respectively, making it unsuitable for complex SQL
Higher error rate than the full GPT-5.4 family (6.8% vs 0.6-2.2%)
No significant speed advantage over GPT-5.4 Mini — only 0.5s faster at 4.1s vs 3.6s, while scoring 13.2 points lower

GPT-5.4 Nano is a budget-tier model best suited for limited scenarios:

Production text-to-SQL applications
Complex multi-table joins or subqueries
Any scenario where GPT-5.4 Mini is available (better in every metric except marginal cost)