Best Free LLM APIs for Chat

147 free models available for chat. How to choose a free LLM for chat →

Coding Chat Vision Audio Reasoning Embedding

For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.

What to Look for in a Chat Model

Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:

Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.

How to Choose a Free Chat Model

Match the model to your chat use case:

Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.

Top Picks for Chat

Google: Gemini 2.5 Flash Google

1M context, multimodal, free tier with 10 RPM / 250 RPD. Best all-round chat model.

Meta: Llama 3.3 70B Instruct Groq

Fastest inference via Groq LPU, strong instruction following, no credit card.

Qwen: Qwen3.5 122B A10B NVIDIA NIM

262K context, strong bilingual (Chinese-English), 40 RPM with no daily cap.

NVIDIA: Nemotron 3 Super (free) OpenRouter

262K context, strong reasoning, solid chat performance.

All Free Chat Models

Provider	Model	Context	Max Output	Modality	Rate Limit	Released
OpenRouter	inclusionAI: Ring-2.6-1T	262K	66K	text	See provider page	May 8, 2026	Details
OpenRouter	Baidu Qianfan: CoBuddy (free)	131K	66K	text	See provider page	May 6, 2026	Details
OpenRouter	Owl Alpha	1.0M	262K	text	See provider page	Apr 28, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Nano Omni (free)	256K	66K	textimageaudio	See provider page	Apr 28, 2026	Details
OpenRouter	Poolside: Laguna XS.2 (free)	131K	8K	text	See provider page	Apr 28, 2026	Details
OpenRouter	Poolside: Laguna M.1 (free)	131K	8K	text	See provider page	Apr 28, 2026	Details
OpenRouter	DeepSeek: DeepSeek V4 Flash (free)	1.0M	384K	text	See provider page	Apr 24, 2026	Details
OpenRouter	Baidu: Qianfan-OCR-Fast	66K	29K	textimage	See provider page	Apr 20, 2026	Details
OpenRouter	Z.ai: GLM 5.1	203K	203K	text	See provider page	Apr 7, 2026	Details
OpenRouter	Google: Gemma 4 26B A4B (free)	262K	33K	textimage	See provider page	Apr 3, 2026	Details
OpenRouter	Google: Gemma 4 31B (free)	262K	33K	textimage	See provider page	Apr 2, 2026	Details
OpenRouter	Arcee AI: Trinity Large Thinking (free)	262K	80K	textreasoning	See provider page	Apr 1, 2026	Details
OpenRouter	Google: Lyria 3 Pro Preview	1.0M	66K	textimage	See provider page	Mar 30, 2026	Details
OpenRouter	Google: Lyria 3 Clip Preview	1.0M	66K	textimage	See provider page	Mar 30, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Super (free)	1.0M	262K	text	See provider page	Mar 11, 2026	Details
OpenRouter	MiniMax: MiniMax M2.5 (free)	205K	8K	text	See provider page	Feb 12, 2026	Details
OpenRouter	Free Models Router	200K	8K	textimage	See provider page	Feb 1, 2026	Details
OpenRouter	LiquidAI: LFM2.5-1.2B-Thinking (free)	33K	8K	textreasoning	See provider page	Jan 20, 2026	Details
OpenRouter	LiquidAI: LFM2.5-1.2B-Instruct (free)	33K	8K	text	See provider page	Jan 20, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Nano 30B A3B (free)	256K	8K	text	See provider page	Dec 14, 2025	Details
OpenRouter	OpenAI: gpt-oss-safeguard-20b	131K	66K	text	See provider page	Oct 29, 2025	Details
OpenRouter	NVIDIA: Nemotron Nano 12B 2 VL (free)	128K	128K	textimage	See provider page	Oct 28, 2025	Details
OpenRouter	Qwen: Qwen3 Next 80B A3B Instruct (free)	262K	8K	text	See provider page	Sep 11, 2025	Details
OpenRouter	NVIDIA: Nemotron Nano 9B V2 (free)	128K	8K	text	See provider page	Sep 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-120b (free)	131K	131K	text	See provider page	Aug 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-20b (free)	131K	8K	text	See provider page	Aug 5, 2025	Details
OpenRouter	Z.ai: GLM 4.5 Air (free)	131K	96K	text	See provider page	Jul 25, 2025	Details
OpenRouter	Qwen: Qwen3 Coder 480B A35B (free)	1.0M	262K	textcode	See provider page	Feb 4, 2026	Details
OpenRouter	Venice: Uncensored (free)	33K	8K	text	See provider page	—	Details
OpenRouter	Meta: Llama 3.3 70B Instruct (free)	131K	8K	text	See provider page	Dec 6, 2024	Details
OpenRouter	Meta: Llama 3.2 3B Instruct (free)	131K	8K	text	See provider page	Sep 25, 2024	Details
OpenRouter	Nous: Hermes 3 405B Instruct (free)	131K	8K	text	See provider page	Aug 16, 2024	Details
Cohere	Command A (111B)	256K	4K	text	20 RPM	—	Details
Cohere	Command R+	128K	4K	text	20 RPM	—	Details
Cohere	Command R7B	128K	4K	text	20 RPM	—	Details
Cohere	Embed 4	131K	131K	text	2,000 inputs/min	—	Details
Cohere	Rerank 3.5	131K	131K	text	10 RPM	—	Details
Google Gemini	Gemini 2.5 Flash	1.0M	65K	text	10 RPM, 250 RPD	—	Details
Google Gemini	Gemini 2.5 Flash-Lite	1.0M	65K	text	15 RPM, 1,000 RPD	—	Details
Mistral AI	Mistral Small 4	256K	256K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Mistral Medium 3	128K	128K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Mistral Large 3	256K	256K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Mistral Nemo (12B)	128K	128K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Codestral	256K	256K	textcode	~1 RPS, 500K TPM	—	Details
Mistral AI	Pixtral Large	128K	128K	textimage	~1 RPS, 500K TPM	—	Details
Z AI (Zhipu AI)	GLM-4.7-Flash	200K	128K	text	1 concurrent request	—	Details
Z AI (Zhipu AI)	GLM-4.5-Flash	128K	8K	text	1 concurrent request	—	Details
Z AI (Zhipu AI)	GLM-4.6V-Flash	128K	4K	text	1 concurrent request	—	Details
Cerebras	llama3.1-8b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	—	Details
Cerebras	gpt-oss-120b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	—	Details
Cerebras	qwen-3-235b-a22b-instruct-2507	131K	8K	text	30 RPM, 14,400 RPD, 1M TPD	—	Details
Cerebras	zai-glm-4.7	128K	8K	text	10 RPM, 100 RPD, 1M TPD	—	Details
Cloudflare Workers AI	@cf/meta/llama-3.3-70b-instruct-fp8-fast	131K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/meta/llama-3.1-8b-instruct-fp8-fast	131K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/meta/llama-3.2-11b-vision-instruct	131K	131K	textimage	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/meta/llama-4-scout-17b-16e-instruct	10.0M	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/mistralai/mistral-small-3.1-24b-instruct	128K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/google/gemma-4-26b-a4b-it	256K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/qwen/qwq-32b	32K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/deepseek-ai/deepseek-r1-distill-qwen-32b	32K	131K	text	10K neurons/day (shared)	—	Details
GitHub Models	gpt-4.1	1.0M	32K	text	10 RPM, 50 RPD	—	Details
GitHub Models	gpt-4.1-mini	1.0M	32K	text	15 RPM, 150 RPD	—	Details
GitHub Models	gpt-4o	128K	16K	text	10 RPM, 50 RPD	—	Details
GitHub Models	o3-mini	200K	100K	text	10 RPM, 50 RPD	—	Details
GitHub Models	o4-mini	200K	100K	text	10 RPM, 50 RPD	—	Details
GitHub Models	Llama-4-Scout-17B-16E	512K	4K	text	15 RPM, 150 RPD	—	Details
GitHub Models	Llama-4-Maverick-17B-128E	256K	4K	text	10 RPM, 50 RPD	—	Details
GitHub Models	Meta-Llama-3.3-70B	131K	4K	text	15 RPM, 150 RPD	—	Details
GitHub Models	DeepSeek-R1	64K	8K	text	15 RPM, 150 RPD	—	Details
GitHub Models	Mistral-Small-3.1	128K	4K	text	15 RPM, 150 RPD	—	Details
Groq	llama-3.3-70b-versatile	131K	32K	text	30 RPM, 14,400 RPD	—	Details
Groq	llama-3.1-8b-instant	131K	131K	text	30 RPM, 14,400 RPD	—	Details
Groq	llama-4-scout-17b-16e-instruct	131K	8K	text	30 RPM, 14,400 RPD	—	Details
Groq	llama-4-maverick-17b-128e-instruct	131K	8K	text	15 RPM, 500 RPD	—	Details
Groq	qwen3-32b	131K	131K	text	30 RPM, 14,400 RPD	—	Details
Groq	kimi-k2-instruct	262K	262K	text	30 RPM, 14,400 RPD	—	Details
Groq	deepseek-r1-distill-70b	131K	8K	text	30 RPM, 14,400 RPD	—	Details
Groq	whisper-large-v3	131K	131K	text	20 RPM, 2,000 RPD	—	Details
Groq	whisper-large-v3-turbo	131K	131K	text	20 RPM, 2,000 RPD	—	Details
Hugging Face	Meta-Llama-3.1-8B-Instruct	128K	4K	text	~1,000 RPD	—	Details
Hugging Face	Mistral-7B-Instruct-v0.3	32K	4K	text	~1,000 RPD	—	Details
Hugging Face	Mixtral-8x7B-Instruct-v0.1	32K	4K	text	~1,000 RPD	—	Details
Hugging Face	Phi-3.5-mini-instruct	128K	4K	text	~1,000 RPD	—	Details
Hugging Face	Qwen2.5-7B-Instruct	131K	4K	text	~1,000 RPD	—	Details
Kilo Code	bytedance-seed/dola-seed-2.0-pro:free	131K	131K	text	~200 req/hr	—	Details
Kilo Code	x-ai/grok-code-fast-1:optimized:free	131K	131K	textcode	~200 req/hr	—	Details
Kilo Code	nvidia/nemotron-3-super-120b-a12b:free	262K	32K	text	~200 req/hr	—	Details
Kilo Code	arcee-ai/trinity-large-thinking:free	131K	131K	text	~200 req/hr	—	Details
LLM7.io	deepseek-r1-0528	131K	131K	text	30 RPM (120 with token)	—	Details
LLM7.io	deepseek-v3-0324	131K	131K	text	30 RPM (120 with token)	—	Details
LLM7.io	gpt-4o-mini	131K	131K	text	30 RPM (120 with token)	—	Details
LLM7.io	mistral-small-3.1-24b	32K	131K	text	30 RPM (120 with token)	—	Details
LLM7.io	qwen2.5-coder-32b	131K	131K	textcode	30 RPM (120 with token)	—	Details
ModelScope	Qwen/Qwen3.5-35B-A3B	131K	131K	text	2,000 RPD total; <=500 RPD/model (dynamic)	—	Details
ModelScope	Qwen/Qwen3.5-27B	131K	131K	text	2,000 RPD total; <=500 RPD/model (dynamic)	—	Details
ModelScope	Qwen/Qwen-Image	131K	131K	text	2,000 RPD total; model/AIGC-specific caps	—	Details
Ollama Cloud	llama3.1:cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	deepseek-r1:cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	qwen2.5:cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	gemma2:cloud	8K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	mistral:cloud	32K	131K	text	Session/weekly limits (unpublished)	—	Details
OVHcloud AI Endpoints	Meta-Llama-3_3-70B-Instruct	131K	4K	text	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	DeepSeek-R1-Distill-Llama-70B	131K	32K	text	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Qwen3-Coder-30B-A3B-Instruct	262K	32K	textcode	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Qwen2.5-VL-72B-Instruct	128K	8K	textimage	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Mistral-Nemo-Instruct-2407	128K	4K	text	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Qwen3Guard-Gen-8B	32K	4K	text	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Qwen3Guard-Gen-0.6B	32K	4K	text	2 RPM (anonymous)	—	Details
SiliconFlow	deepseek-ai/DeepSeek-R1-0528-Qwen3-8B	33K	16K	text	1,000 RPM, 50K TPM	—	Details
SiliconFlow	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	131K	131K	text	1,000 RPM, 50K TPM	—	Details
SiliconFlow	THUDM/glm-4-9b-chat	32K	32K	text	1,000 RPM, 50K TPM	—	Details
SiliconFlow	THUDM/GLM-4.1V-9B-Thinking	66K	66K	text	1,000 RPM, 50K TPM	—	Details
SiliconFlow	deepseek-ai/DeepSeek-OCR	131K	8K	text	1,000 RPM, 50K TPM	—	Details
SiliconFlow	Abbreviation	131K	8K	text	See provider page	—	Details
NVIDIA NIM	deepseek-ai/deepseek-v4-flash	1.0M	384K	text	Up to 40 RPM	—	Details
NVIDIA NIM	deepseek-ai/deepseek-v4-pro	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-3.1-70b-instruct	131K	16K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-3.2-11b-vision-instruct	131K	16K	textimage	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-3.2-1b-instruct	131K	60K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-3.2-3b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-guard-4-12b	164K	16K	textimage	Up to 40 RPM	—	Details
NVIDIA NIM	minimaxai/minimax-m2.7	205K	131K	text	Up to 40 RPM	—	Details
NVIDIA NIM	mistralai/mistral-large-2-instruct	131K	8K	text	Up to 40 RPM	Feb 26, 2024	Details
NVIDIA NIM	moonshotai/kimi-k2.6	262K	8K	text	Up to 40 RPM	Apr 20, 2026	Details
NVIDIA NIM	nvidia/llama-3.1-nemotron-ultra-253b-v1	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.3-nemotron-super-49b-v1.5	131K	16K	text	Up to 40 RPM	Oct 10, 2025	Details
NVIDIA NIM	qwen/qwen3.5-122b-a10b	262K	66K	textimage	Up to 40 RPM	Feb 25, 2026	Details
NVIDIA NIM	qwen/qwen3.5-397b-a17b	262K	66K	textimage	Up to 40 RPM	Feb 16, 2026	Details
NVIDIA NIM	stepfun-ai/step-3.5-flash	262K	66K	text	Up to 40 RPM	—	Details
NVIDIA NIM	z-ai/glm-5.1	203K	8K	text	Up to 40 RPM	Apr 7, 2026	Details
OpenRouter	NVIDIA: Llama Nemotron Embed VL 1B V2 (free)	131K	8K	textimageembeddings	See provider page	Feb 25, 2026	Details
Chutes.ai	Llama 3.1 70B	131K	0	text	Community-powered, no hard cap	—	Details
Glhf.chat	Llama 3.1 70B	131K	0	text	Unlimited for free models	—	Details
Glhf.chat	Mixtral 8x7B	33K	0	text	Unlimited for free models	—	Details
Grok (xAI)	Grok-2	131K	0	text	$25/month free credits, resets monthly	—	Details
Grok (xAI)	Grok-2 Mini	131K	0	text	$25/month free credits, resets monthly	—	Details
Groq	Moonshot Kimi K2	131K	0	text	See provider page	—	Details
Groq	Moonshot Kimi K2 0905	131K	0	text	See provider page	—	Details
Groq	GPT-OSS 120B	131K	0	text	See provider page	—	Details
Groq	GPT-OSS 20B	131K	0	text	See provider page	—	Details
GitHub Models	Mistral Large (24.11)	131K	0	text	See provider page	—	Details
GitHub Models	AI21 Jamba 1.5 Large	256K	0	text	See provider page	—	Details
Cerebras	Llama 3.1 70B	131K	0	text	See provider page	—	Details
Mistral AI	Mistral 7B	33K	0	text	See provider page	—	Details
Mistral AI	Mixtral 8x7B	33K	0	text	See provider page	—	Details
Cloudflare Workers AI	Mistral 7B	33K	0	text	See provider page	—	Details
Cloudflare Workers AI	Qwen 1.5 7B	33K	0	text	See provider page	—	Details

See our FAQ for common questions about free LLM APIs