Best Free LLM APIs for Chat

147 free models available for chat. How to choose a free LLM for chat →

For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.

What to Look for in a Chat Model

Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:

  • Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
  • Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
  • Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
  • Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
  • Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.

How to Choose a Free Chat Model

Match the model to your chat use case:

  • Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
  • Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
  • Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
  • Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
  • Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.

Top Picks for Chat

All Free Chat Models

Provider Model Context Max Output Modality Rate Limit Released
OpenRouter inclusionAI: Ring-2.6-1T 262K 66K text See provider page May 8, 2026 Details
OpenRouter Baidu Qianfan: CoBuddy (free) 131K 66K text See provider page May 6, 2026 Details
OpenRouter Owl Alpha 1.0M 262K text See provider page Apr 28, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano Omni (free) 256K 66K textimageaudio See provider page Apr 28, 2026 Details
OpenRouter Poolside: Laguna XS.2 (free) 131K 8K text See provider page Apr 28, 2026 Details
OpenRouter Poolside: Laguna M.1 (free) 131K 8K text See provider page Apr 28, 2026 Details
OpenRouter DeepSeek: DeepSeek V4 Flash (free) 1.0M 384K text See provider page Apr 24, 2026 Details
OpenRouter Baidu: Qianfan-OCR-Fast 66K 29K textimage See provider page Apr 20, 2026 Details
OpenRouter Z.ai: GLM 5.1 203K 203K text See provider page Apr 7, 2026 Details
OpenRouter Google: Gemma 4 26B A4B (free) 262K 33K textimage See provider page Apr 3, 2026 Details
OpenRouter Google: Gemma 4 31B (free) 262K 33K textimage See provider page Apr 2, 2026 Details
OpenRouter Arcee AI: Trinity Large Thinking (free) 262K 80K textreasoning See provider page Apr 1, 2026 Details
OpenRouter Google: Lyria 3 Pro Preview 1.0M 66K textimage See provider page Mar 30, 2026 Details
OpenRouter Google: Lyria 3 Clip Preview 1.0M 66K textimage See provider page Mar 30, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Super (free) 1.0M 262K text See provider page Mar 11, 2026 Details
OpenRouter MiniMax: MiniMax M2.5 (free) 205K 8K text See provider page Feb 12, 2026 Details
OpenRouter Free Models Router 200K 8K textimage See provider page Feb 1, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Thinking (free) 33K 8K textreasoning See provider page Jan 20, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Instruct (free) 33K 8K text See provider page Jan 20, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano 30B A3B (free) 256K 8K text See provider page Dec 14, 2025 Details
OpenRouter OpenAI: gpt-oss-safeguard-20b 131K 66K text See provider page Oct 29, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 12B 2 VL (free) 128K 128K textimage See provider page Oct 28, 2025 Details
OpenRouter Qwen: Qwen3 Next 80B A3B Instruct (free) 262K 8K text See provider page Sep 11, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 9B V2 (free) 128K 8K text See provider page Sep 5, 2025 Details
OpenRouter OpenAI: gpt-oss-120b (free) 131K 131K text See provider page Aug 5, 2025 Details
OpenRouter OpenAI: gpt-oss-20b (free) 131K 8K text See provider page Aug 5, 2025 Details
OpenRouter Z.ai: GLM 4.5 Air (free) 131K 96K text See provider page Jul 25, 2025 Details
OpenRouter Qwen: Qwen3 Coder 480B A35B (free) 1.0M 262K textcode See provider page Feb 4, 2026 Details
OpenRouter Venice: Uncensored (free) 33K 8K text See provider page Details
OpenRouter Meta: Llama 3.3 70B Instruct (free) 131K 8K text See provider page Dec 6, 2024 Details
OpenRouter Meta: Llama 3.2 3B Instruct (free) 131K 8K text See provider page Sep 25, 2024 Details
OpenRouter Nous: Hermes 3 405B Instruct (free) 131K 8K text See provider page Aug 16, 2024 Details
Cohere Command A (111B) 256K 4K text 20 RPM Details
Cohere Command R+ 128K 4K text 20 RPM Details
Cohere Command R7B 128K 4K text 20 RPM Details
Cohere Embed 4 131K 131K text 2,000 inputs/min Details
Cohere Rerank 3.5 131K 131K text 10 RPM Details
Google Gemini Gemini 2.5 Flash 1.0M 65K text 10 RPM, 250 RPD Details
Google Gemini Gemini 2.5 Flash-Lite 1.0M 65K text 15 RPM, 1,000 RPD Details
Mistral AI Mistral Small 4 256K 256K text ~1 RPS, 500K TPM Details
Mistral AI Mistral Medium 3 128K 128K text ~1 RPS, 500K TPM Details
Mistral AI Mistral Large 3 256K 256K text ~1 RPS, 500K TPM Details
Mistral AI Mistral Nemo (12B) 128K 128K text ~1 RPS, 500K TPM Details
Mistral AI Codestral 256K 256K textcode ~1 RPS, 500K TPM Details
Mistral AI Pixtral Large 128K 128K textimage ~1 RPS, 500K TPM Details
Z AI (Zhipu AI) GLM-4.7-Flash 200K 128K text 1 concurrent request Details
Z AI (Zhipu AI) GLM-4.5-Flash 128K 8K text 1 concurrent request Details
Z AI (Zhipu AI) GLM-4.6V-Flash 128K 4K text 1 concurrent request Details
Cerebras llama3.1-8b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
Cerebras gpt-oss-120b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
Cerebras qwen-3-235b-a22b-instruct-2507 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
Cerebras zai-glm-4.7 128K 8K text 10 RPM, 100 RPD, 1M TPD Details
Cloudflare Workers AI @cf/meta/llama-3.3-70b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/meta/llama-3.1-8b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/meta/llama-3.2-11b-vision-instruct 131K 131K textimage 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/meta/llama-4-scout-17b-16e-instruct 10.0M 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/mistralai/mistral-small-3.1-24b-instruct 128K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/google/gemma-4-26b-a4b-it 256K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/qwen/qwq-32b 32K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K 131K text 10K neurons/day (shared) Details
GitHub Models gpt-4.1 1.0M 32K text 10 RPM, 50 RPD Details
GitHub Models gpt-4.1-mini 1.0M 32K text 15 RPM, 150 RPD Details
GitHub Models gpt-4o 128K 16K text 10 RPM, 50 RPD Details
GitHub Models o3-mini 200K 100K text 10 RPM, 50 RPD Details
GitHub Models o4-mini 200K 100K text 10 RPM, 50 RPD Details
GitHub Models Llama-4-Scout-17B-16E 512K 4K text 15 RPM, 150 RPD Details
GitHub Models Llama-4-Maverick-17B-128E 256K 4K text 10 RPM, 50 RPD Details
GitHub Models Meta-Llama-3.3-70B 131K 4K text 15 RPM, 150 RPD Details
GitHub Models DeepSeek-R1 64K 8K text 15 RPM, 150 RPD Details
GitHub Models Mistral-Small-3.1 128K 4K text 15 RPM, 150 RPD Details
Groq llama-3.3-70b-versatile 131K 32K text 30 RPM, 14,400 RPD Details
Groq llama-3.1-8b-instant 131K 131K text 30 RPM, 14,400 RPD Details
Groq llama-4-scout-17b-16e-instruct 131K 8K text 30 RPM, 14,400 RPD Details
Groq llama-4-maverick-17b-128e-instruct 131K 8K text 15 RPM, 500 RPD Details
Groq qwen3-32b 131K 131K text 30 RPM, 14,400 RPD Details
Groq kimi-k2-instruct 262K 262K text 30 RPM, 14,400 RPD Details
Groq deepseek-r1-distill-70b 131K 8K text 30 RPM, 14,400 RPD Details
Groq whisper-large-v3 131K 131K text 20 RPM, 2,000 RPD Details
Groq whisper-large-v3-turbo 131K 131K text 20 RPM, 2,000 RPD Details
Hugging Face Meta-Llama-3.1-8B-Instruct 128K 4K text ~1,000 RPD Details
Hugging Face Mistral-7B-Instruct-v0.3 32K 4K text ~1,000 RPD Details
Hugging Face Mixtral-8x7B-Instruct-v0.1 32K 4K text ~1,000 RPD Details
Hugging Face Phi-3.5-mini-instruct 128K 4K text ~1,000 RPD Details
Hugging Face Qwen2.5-7B-Instruct 131K 4K text ~1,000 RPD Details
Kilo Code bytedance-seed/dola-seed-2.0-pro:free 131K 131K text ~200 req/hr Details
Kilo Code x-ai/grok-code-fast-1:optimized:free 131K 131K textcode ~200 req/hr Details
Kilo Code nvidia/nemotron-3-super-120b-a12b:free 262K 32K text ~200 req/hr Details
Kilo Code arcee-ai/trinity-large-thinking:free 131K 131K text ~200 req/hr Details
LLM7.io deepseek-r1-0528 131K 131K text 30 RPM (120 with token) Details
LLM7.io deepseek-v3-0324 131K 131K text 30 RPM (120 with token) Details
LLM7.io gpt-4o-mini 131K 131K text 30 RPM (120 with token) Details
LLM7.io mistral-small-3.1-24b 32K 131K text 30 RPM (120 with token) Details
LLM7.io qwen2.5-coder-32b 131K 131K textcode 30 RPM (120 with token) Details
ModelScope Qwen/Qwen3.5-35B-A3B 131K 131K text 2,000 RPD total; <=500 RPD/model (dynamic) Details
ModelScope Qwen/Qwen3.5-27B 131K 131K text 2,000 RPD total; <=500 RPD/model (dynamic) Details
ModelScope Qwen/Qwen-Image 131K 131K text 2,000 RPD total; model/AIGC-specific caps Details
Ollama Cloud llama3.1:cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud deepseek-r1:cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud qwen2.5:cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud gemma2:cloud 8K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud mistral:cloud 32K 131K text Session/weekly limits (unpublished) Details
OVHcloud AI Endpoints Meta-Llama-3_3-70B-Instruct 131K 4K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints DeepSeek-R1-Distill-Llama-70B 131K 32K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen3-Coder-30B-A3B-Instruct 262K 32K textcode 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen2.5-VL-72B-Instruct 128K 8K textimage 2 RPM (anonymous) Details
OVHcloud AI Endpoints Mistral-Nemo-Instruct-2407 128K 4K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen3Guard-Gen-8B 32K 4K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen3Guard-Gen-0.6B 32K 4K text 2 RPM (anonymous) Details
SiliconFlow deepseek-ai/DeepSeek-R1-0528-Qwen3-8B 33K 16K text 1,000 RPM, 50K TPM Details
SiliconFlow deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K 131K text 1,000 RPM, 50K TPM Details
SiliconFlow THUDM/glm-4-9b-chat 32K 32K text 1,000 RPM, 50K TPM Details
SiliconFlow THUDM/GLM-4.1V-9B-Thinking 66K 66K text 1,000 RPM, 50K TPM Details
SiliconFlow deepseek-ai/DeepSeek-OCR 131K 8K text 1,000 RPM, 50K TPM Details
SiliconFlow Abbreviation 131K 8K text See provider page Details
NVIDIA NIM deepseek-ai/deepseek-v4-flash 1.0M 384K text Up to 40 RPM Details
NVIDIA NIM deepseek-ai/deepseek-v4-pro 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/llama-3.1-70b-instruct 131K 16K text Up to 40 RPM Details
NVIDIA NIM meta/llama-3.2-11b-vision-instruct 131K 16K textimage Up to 40 RPM Details
NVIDIA NIM meta/llama-3.2-1b-instruct 131K 60K text Up to 40 RPM Details
NVIDIA NIM meta/llama-3.2-3b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/llama-guard-4-12b 164K 16K textimage Up to 40 RPM Details
NVIDIA NIM minimaxai/minimax-m2.7 205K 131K text Up to 40 RPM Details
NVIDIA NIM mistralai/mistral-large-2-instruct 131K 8K text Up to 40 RPM Feb 26, 2024 Details
NVIDIA NIM moonshotai/kimi-k2.6 262K 8K text Up to 40 RPM Apr 20, 2026 Details
NVIDIA NIM nvidia/llama-3.1-nemotron-ultra-253b-v1 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.3-nemotron-super-49b-v1.5 131K 16K text Up to 40 RPM Oct 10, 2025 Details
NVIDIA NIM qwen/qwen3.5-122b-a10b 262K 66K textimage Up to 40 RPM Feb 25, 2026 Details
NVIDIA NIM qwen/qwen3.5-397b-a17b 262K 66K textimage Up to 40 RPM Feb 16, 2026 Details
NVIDIA NIM stepfun-ai/step-3.5-flash 262K 66K text Up to 40 RPM Details
NVIDIA NIM z-ai/glm-5.1 203K 8K text Up to 40 RPM Apr 7, 2026 Details
OpenRouter NVIDIA: Llama Nemotron Embed VL 1B V2 (free) 131K 8K textimageembeddings See provider page Feb 25, 2026 Details
Chutes.ai Llama 3.1 70B 131K 0 text Community-powered, no hard cap Details
Glhf.chat Llama 3.1 70B 131K 0 text Unlimited for free models Details
Glhf.chat Mixtral 8x7B 33K 0 text Unlimited for free models Details
Grok (xAI) Grok-2 131K 0 text $25/month free credits, resets monthly Details
Grok (xAI) Grok-2 Mini 131K 0 text $25/month free credits, resets monthly Details
Groq Moonshot Kimi K2 131K 0 text See provider page Details
Groq Moonshot Kimi K2 0905 131K 0 text See provider page Details
Groq GPT-OSS 120B 131K 0 text See provider page Details
Groq GPT-OSS 20B 131K 0 text See provider page Details
GitHub Models Mistral Large (24.11) 131K 0 text See provider page Details
GitHub Models AI21 Jamba 1.5 Large 256K 0 text See provider page Details
Cerebras Llama 3.1 70B 131K 0 text See provider page Details
Mistral AI Mistral 7B 33K 0 text See provider page Details
Mistral AI Mixtral 8x7B 33K 0 text See provider page Details
Cloudflare Workers AI Mistral 7B 33K 0 text See provider page Details
Cloudflare Workers AI Qwen 1.5 7B 33K 0 text See provider page Details