Best Free LLM APIs for Coding

18 free models available for coding. How to choose a free LLM for coding →

Coding Chat Vision Audio Reasoning Embedding

For AI coding, prioritize large context windows (to process entire codebases), tool calling support, and strong instruction following. The best free coding models include Codestral (Mistral, purpose-built for code), DeepSeek V4, Qwen3-Coder, and Gemini 2.5 Flash (1M context). Models are ranked below by context window and rate limit.

What to Look for in a Coding Model

Not all LLMs are equally good at coding. Here's what separates a coding model from a general-purpose one:

Context window — The single most important spec for coding. Modern codebases easily exceed 50K tokens. A model with less than 32K context will struggle with multi-file edits, code review, or understanding project structure. Look for at least 128K tokens; 256K+ is ideal for monorepo work.
Fill-in-the-Middle (FIM) — A specialized training objective where the model learns to fill a gap between prefix and suffix code. Essential for inline code completion in IDEs. Codestral and DeepSeek Coder variants are trained with FIM.
Tool calling / function calling — Required for agentic coding workflows: "find all files that import X, then refactor them to use Y." Without tool calling, the model can only suggest code, not execute actions. Most OpenAI-compatible endpoints support tool calling if the underlying model does.
Instruction following — Coding requires precise, unambiguous outputs. Models that drift or hallucinate will introduce bugs. DeepSeek V4 and Qwen3 score particularly well on instruction-following benchmarks.
Max output tokens — Generating a full file or multiple functions in one shot requires high output limits. 8K output is the practical minimum; 16K+ lets the model generate entire modules at once.

How to Choose a Free Coding Model

Your pick depends on how you code:

Using Claude Code or Cursor? → Prioritize context window and tool calling. Gemini 2.5 Flash (1M ctx) or DeepSeek V4 (256K) let the agent see your whole project. Both support tool calling via OpenAI-compatible endpoints.
Inline completion in VS Code / JetBrains? → Look for FIM support. Codestral (Mistral) is purpose-built for this. DeepSeek Coder variants also support FIM.
Code review / PR review? → Large context is critical — the diff + surrounding code + review guidelines all need to fit in one prompt. Gemini 2.5 Flash's 1M context handles this with room to spare.
Learning to code? → Prioritize helpfulness and explanation quality. Qwen3 and Llama 3.3 70B are known for clear, educational code explanations.
Rate limit sensitive? → NVIDIA NIM has 40 RPM with no daily cap, ideal for heavy coding sessions. Groq has 30 RPM / 14,400 RPD — enough for most solo developers.

Try models in the Playground with a real coding task before committing — the same benchmark scores don't always match your specific language or framework.

Top Picks for Coding

Google: Gemini 2.5 Flash Google

1M context, multimodal, strong all-round coding. Free tier: 10 RPM, 250 RPD.

DeepSeek: DeepSeek V4 Flash (free) OpenRouter

256K context, latest-gen coding model. Strong instruction following, FIM support.

Codestral Mistral AI

Purpose-built for code. 256K context, FIM support, no credit card required.

Qwen: Qwen3 Coder 480B A35B (free) OpenRouter

Massive 480B MoE model specialized for code. 262K context.

All Free Coding Models

Provider	Model	Context	Max Output	Modality	Rate Limit	Released
OpenRouter	OpenAI: gpt-oss-safeguard-20b	131K	66K	text	See provider page	Oct 29, 2025	Details
OpenRouter	OpenAI: gpt-oss-120b (free)	131K	131K	text	See provider page	Aug 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-20b (free)	131K	8K	text	See provider page	Aug 5, 2025	Details
OpenRouter	Qwen: Qwen3 Coder 480B A35B (free)	1.0M	262K	textcode	See provider page	Feb 4, 2026	Details
Mistral AI	Codestral	256K	256K	textcode	~1 RPS, 500K TPM	—	Details
Cerebras	gpt-oss-120b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	—	Details
Kilo Code	x-ai/grok-code-fast-1:optimized:free	131K	131K	textcode	~200 req/hr	—	Details
LLM7.io	qwen2.5-coder-32b	131K	131K	textcode	30 RPM (120 with token)	—	Details
OVHcloud AI Endpoints	Qwen3-Coder-30B-A3B-Instruct	262K	32K	textcode	2 RPM (anonymous)	—	Details
Chutes.ai	Llama 3.1 70B	131K	0	text	Community-powered, no hard cap	—	Details
Glhf.chat	Llama 3.1 70B	131K	0	text	Unlimited for free models	—	Details
Glhf.chat	Mixtral 8x7B	33K	0	text	Unlimited for free models	—	Details
Groq	Moonshot Kimi K2	131K	0	text	See provider page	—	Details
Groq	Moonshot Kimi K2 0905	131K	0	text	See provider page	—	Details
Groq	GPT-OSS 120B	131K	0	text	See provider page	—	Details
GitHub Models	Mistral Large (24.11)	131K	0	text	See provider page	—	Details
Cerebras	Llama 3.1 70B	131K	0	text	See provider page	—	Details
Mistral AI	Mixtral 8x7B	33K	0	text	See provider page	—	Details

See our FAQ for common questions about free LLM APIs