Quick Reference: All 20 Free LLM API Providers
If you just want the signup link, here's every provider with free models, sorted by model count. Bookmark this table.
| Provider | Free Models | Credit Card? | Get Your Key |
|---|---|---|---|
| OpenRouter | 30 | Not required | Sign up → |
| Groq | 14 | Not required | Sign up → |
| GitHub Models | 13 | Not required | Sign up → |
| NVIDIA NIM | 11 | Not required | Sign up → |
| Cloudflare Workers AI | 10 | Not required | Sign up → |
| Mistral AI | 8 | Not required | Sign up → |
| OVHcloud AI Endpoints | 7 | Not required | Sign up → |
| SiliconFlow | 6 | Not required | Sign up → |
| Cohere | 5 | Not required | Sign up → |
| Cerebras | 5 | Not required | Sign up → |
| Hugging Face | 5 | Not required | Sign up → |
| LLM7.io | 5 | Not required | Sign up → |
| Ollama Cloud | 5 | Not required | Sign up → |
| Kilo Code | 4 | Not required | Sign up → |
| Z AI (Zhipu AI) | 3 | Not required | Sign up → |
| ModelScope | 3 | Not required | Sign up → |
| Google Gemini | 2 | Not required | Sign up → |
| Chutes.ai | 2 | Not required | Sign up → |
| Glhf.chat | 2 | Not required | Sign up → |
| Grok (xAI) | 2 | Required | Sign up → |
The Fastest Path: Get a Working API Key in 2 Minutes
If you just want to start building right now, here's the shortest path to a working free API key:
- Go to aistudio.google.com/app/apikey
- Sign in with your Google account (or create one — takes 30 seconds)
- Click "Create API Key"
- Copy the key. Done. No credit card. No phone number. No wait.
This gets you Gemini 2.5 Flash — 1 million token context, multimodal (text + image + audio + video), 10 requests/minute, 250 requests/day. Enough to build and test a full application.
Want more options? Read on for every provider's signup process, step by step.
Tier 1: No Credit Card — Instant Key with Just an Email
These 7 providers give you an API key the moment you sign up. No credit card. No phone. No waiting.
1. Google AI Studio — Free Gemini API Key
How to get the key:
- Go to aistudio.google.com
- Sign in with a Google account
- Click "Get API Key" in the left sidebar
- Click "Create API Key" → copy it
What you get: Access to Gemini 2.5 Flash (1M context), Gemini 2.5 Flash-Lite, Gemma 3 models. Full multimodal — text, image, audio, video input. The 1M context window on Flash is unmatched at any price.
Fine print: Google may use free-tier prompts for product improvement. If you're building a commercial app, read their data policy. The API format is Google's own (not OpenAI-compatible), but wrappers exist.
Endpoint: https://generativelanguage.googleapis.com/v1beta
2. Groq — Fastest Free LLM Inference
How to get the key:
- Go to console.groq.com
- Sign up with email or Google SSO
- Go to "API Keys" in the left menu
- Click "Create API Key" → name it → copy
What you get: Ultra-fast inference (~2,600 tok/s) on Llama 4 Scout/Maverick, Qwen3, DeepSeek-R1-Distill, Mistral, and Whisper models. Groq's LPU chips deliver the lowest latency of any free provider — great for interactive agents and real-time chat.
Fine print: Groq does not train on your data. Rate limits reset daily. Models are limited to ~8K context on the free tier. OpenAI-compatible endpoint.
Endpoint: https://api.groq.com/openai/v1
3. OpenRouter — One Key for 35+ Free Models
How to get the key:
- Go to openrouter.ai
- Sign in with Google or GitHub
- Go to "Keys" → create a key
- That's it. One key works for 35+ free models across Google, Meta, Mistral, DeepSeek, and more.
What you get: The widest free model selection of any provider — Llama 4, Qwen3, DeepSeek-V3, Gemma 3, Mistral Large, and 30+ others. All through a single OpenAI-compatible endpoint. Free models are marked with :free suffix in the model ID.
Fine print: 200 requests/day on the free tier (or 1,000/day with a one-time $10 top-up — the credit never expires). Some :free models return 403 in certain regions. OpenRouter pools multiple upstream providers, so model availability can fluctuate.
Endpoint: https://openrouter.ai/api/v1
4. NVIDIA NIM — 100+ Models, No Daily Token Cap
How to get the key:
- Go to build.nvidia.com
- Sign up for a NVIDIA Developer Program account (free)
- Verify your phone number (required — they send an SMS code)
- Go to any model page → "Get API Key"
What you get: The largest selection of open-weight models — 100+ including DeepSeek-V3, DeepSeek-R1, Llama 3.1 70B, Qwen3.5 397B, Nemotron-Super, Mistral Large 2, and domain-specific models. The key advantage: no daily token cap. You're limited only by RPM (~40), not by total daily usage.
Fine print: Phone verification is mandatory — the one friction point. Some models listed in the catalog aren't actually callable (we've tested and marked them). Model IDs use slash format — deepseek-ai/deepseek-v3 not deepseek-ai-deepseek-v3. See our NVIDIA NIM provider page for the full list.
Endpoint: https://integrate.api.nvidia.com/v1
5. GitHub Models — Free GPT-4.1, o3, Llama 4
How to get the key:
- Make sure you have a GitHub account (free to create)
- Go to github.com/marketplace/models
- Browse to any model → click "Get API Key" or use the "Try in Playground" option
- The API key is your GitHub personal access token (or use the built-in key in the playground)
What you get: This is the holy grail for many developers — free access to GPT-5, GPT-4.1, o3, o4-mini, and Grok 3 — models that are paid everywhere else. Plus Llama 4, DeepSeek-R1, Mistral Large, and Cohere Command A. 45+ models total. The catch is rate limits tied to your Copilot tier.
Fine print: Token limits per request are small (8K in / 4K out on free tier). Rate limits are unpublished and tied to your Copilot subscription. This is great for prototyping and testing, not for production workloads. Azure-hosted (endpoint is models.inference.ai.azure.com).
Endpoint: https://models.inference.ai.azure.com
6. Cerebras — Ultra-Fast on WSE Chips
How to get the key:
- Go to cloud.cerebras.ai
- Sign up with email or Google
- Go to "API Keys" → create a key
What you get: Llama 3.1 8B and GPT-OSS 120B running on Cerebras WSE (Wafer-Scale Engine) chips — some of the fastest AI hardware available. Good for high-throughput batch inference where latency matters.
Fine print: Context limited to 8K on the free tier. Smaller model selection than Groq or OpenRouter. OpenAI-compatible endpoint.
Endpoint: https://api.cerebras.ai/v1
7. Z AI (Zhipu) — GLM Series, No Credit Card
How to get the key:
- Go to open.bigmodel.cn
- Sign up (may require Chinese phone number for full access)
- Go to "API Keys" in the user center
- Create a key
What you get: GLM-4.7-Flash — Zhipu's latest model with 200K context. Bilingual Chinese/English. Good for Chinese-language applications and document processing. GLM-4.7 handles reasoning tasks well.
Fine print: 1 concurrent request limit is the tightest of any provider — expect queueing if multiple clients share the same key. Interface is primarily in Chinese. OpenAI-compatible endpoint.
Endpoint: https://open.bigmodel.cn/api/paas/v4
Tier 2: No Credit Card — But Some Verification Required
These providers don't charge your card, but they want a phone number, a Cloudflare account, or other verification before issuing a key.
8. Mistral AI — ~1B Tokens/Month Free
How to get the key:
- Go to console.mistral.ai
- Sign up with email
- Verify your phone number (they send an SMS)
- Select the "Experiment" plan (free)
- Go to "API Keys" → create a key
What you get: Access to Mistral Small, Medium, Large, and Codestral — Mistral's code-optimized model. ~1 billion tokens/month on the Experiment plan. All models are OpenAI-compatible. Mistral's models are strong for European languages and coding.
Fine print: Phone verification is required — no way around it. Free-tier data may be used for model improvement unless you explicitly opt out in settings. 1 RPS is the lowest concurrency of any major provider — not suitable for multi-user apps.
Endpoint: https://api.mistral.ai/v1
9. Cloudflare Workers AI — Edge Inference, 10K Neurons/Day
How to get the key:
- Go to dash.cloudflare.com
- Create a Cloudflare account (free)
- Go to "AI" → "Workers AI" in the dashboard
- Use the REST API endpoint with your Cloudflare API token
What you get: 50+ models running on Cloudflare's global edge network — low latency from anywhere in the world. Llama, Mistral, Gemma, DeepSeek-R1-Distill, Qwen, and BGE embedding models. Text, image, audio, and embedding modalities.
Fine print: Billing is based on "Neurons" (Cloudflare's compute unit), not tokens — harder to predict costs/limits. Some models require a paid Workers plan ($5/month). The API format is Cloudflare-specific, not OpenAI-compatible by default.
Endpoint: https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run
10. Cohere — Command A + Embed, 1,000 Calls/Month
How to get the key:
- Go to dashboard.cohere.com
- Sign up with email or Google
- Go to "API Keys" → create a Trial key
What you get: Command A (111B parameters), Command R, Command R+, Aya Expanse (multilingual), Embed v4 (embedding), and Rerank models. Cohere's models are particularly strong for RAG (retrieval-augmented generation) workflows — their Embed and Rerank APIs are industry-leading.
Fine print: Trial key is non-commercial only. Limited to 1,000 API calls/month across all models — one of the tighter limits. The key expires after a trial period (typically 3 months).
Endpoint: https://api.cohere.com/v2
11. Hugging Face Inference API — Rotating Open Models
How to get the key:
- Go to huggingface.co
- Create a free account
- Go to Settings → Access Tokens
- Create a token (read-only is sufficient for inference)
What you get: Access to a rotating selection of open-weight models through the Hugging Face Serverless Inference API — Qwen, Llama, Gemma, SmolLM. The model lineup changes as Hugging Face updates their free tier.
Fine print: Not OpenAI-compatible — uses the Hugging Face Inference API format. Shared infrastructure means latency can vary significantly. Rate limits are not precisely documented (~1,000 requests/day approximate). Better for experimentation than production.
Endpoint: https://api-inference.huggingface.co/models
12. OVHcloud AI Endpoints — No Registration at All
How to get the key:
- Option A (anonymous): No key needed. Just send requests to the endpoint. 2 RPM limit.
- Option B (registered): Create an account at OVHcloud AI Endpoints for higher rate limits. Free registration.
What you get: EU-hosted inference for Qwen3-Coder, Mistral, Llama, DeepSeek, and other open models. GDPR compliant. The anonymous tier is unique — no other provider lets you call models with zero registration.
Fine print: 2 RPM on the anonymous tier is very low. Registered tier has better limits but OVHcloud doesn't publish exact numbers. EU latency is excellent; outside Europe may be slower. OpenAI-compatible.
Endpoint: https://qwen3-coder-30b-a3b-instruct.endpoints.ovh.net/v1 (model-specific)
Tier 3: Regional Providers — Best for China / Asia-Pacific
These Chinese providers offer generous free tiers but are optimized for users in China and nearby regions. If you're building for the Chinese market, these are your best options.
13. SiliconFlow — 1,000 RPM Free
How to get the key:
- Go to cloud.siliconflow.cn
- Register (may require Chinese phone number)
- Go to "API Keys" → create a key
What you get: DeepSeek-V3, DeepSeek-R1, Qwen3, Qwen2.5, BGE embeddings, and Stable Diffusion models. The 1,000 RPM rate limit is the most generous of any free tier. Perfect for high-throughput applications.
Fine print: Registration may require a Chinese phone number. Latency is excellent in China/Asia but higher in US/Europe. Interface is in Chinese. OpenAI-compatible endpoint.
Endpoint: https://api.siliconflow.cn/v1
14. ModelScope — Alibaba's Model Hub
How to get the key:
- Go to modelscope.cn
- Register for an account
- Go to "Access Token" in user settings
- Create a token
What you get: Alibaba's Qwen3.5 models (including 35B-A3B MoE), DeepSeek models, and other Chinese LLMs. 2,000 requests/day across all models. The Qwen family is particularly strong for coding and Chinese-language tasks.
Fine print: 500 RPD cap per individual model. Registration may require Chinese credentials. Interface in Chinese. OpenAI-compatible endpoint.
Endpoint: https://api-inference.modelscope.cn/v1
Tier 4: Niche Providers — Specialized Use Cases
These providers don't compete on breadth. They solve specific problems — coding gateways, rare free models, or specialized APIs.
15. Kilo Code — Coding-Optimized Gateway
How to get the key:
- Go to kilo.ai
- Sign up — the API key works for all models on the gateway
What you get: Kilo Code is a coding-specific model router — it directs your request to the best coding model available (ByteDance Seed, Grok Code Fast, NVIDIA Nemotron, Arcee Trinity). Purpose-built for AI code editors like its VS Code extension.
Fine print: Models are routed dynamically — you don't control which model handles your request. ~200 req/hr is per-IP. OpenAI-compatible.
Endpoint: https://api.kilo.ai/api/gateway
16. LLM7.io — Free GPT-4o-mini
How to get the key:
- Go to token.llm7.io
- Follow the token registration process
- Use the token as your API key
What you get: Free GPT-4o-mini — one of the only places to get OpenAI's own models on a free tier. Also DeepSeek-R1, Qwen, and Llama. The free GPT-4o-mini access is the main draw here.
Fine print: Independent aggregator — reliability and uptime are not guaranteed. Token registration process is less polished than bigger providers. OpenAI-compatible.
Endpoint: https://api.llm7.io/v1
17. Ollama Cloud — Hosted Ollama Runtime
How to get the key:
- Go to ollama.com
- Create an account
- Go to Settings → API Keys
- Generate a key
What you get: Llama, Qwen, and Gemma models through the Ollama-native API. If you already use Ollama locally, the cloud version works with the same tooling. OpenAI-compatible wrapper available.
Fine print: Rate limits are unpublished — expect them to be lower than major providers. Best for developers already in the Ollama ecosystem who want a zero-config cloud fallback.
Endpoint: https://api.ollama.com
Head-to-Head: Rate Limits Comparison
| Provider | RPM | RPD | Tokens/Day | Max Context | Credit Card |
|---|---|---|---|---|---|
| SiliconFlow | 1,000 | — | 50K TPM | 131K | No |
| NVIDIA NIM | ~40 | No cap | No cap | 1M | No |
| Groq | 30 | 14,400 | — | 8K | No |
| LLM7.io | 30 | — | — | 131K | No |
| Cohere | 20 | 1,000/mo | — | 128K | No |
| Google AI Studio | 10 | 250 | — | 1M | No |
| GitHub Models | Varies | Varies | ~8K/req | 128K | No |
| OpenRouter | — | 200 | — | 1M | No |
| Cerebras | — | 14,400 | 1M | 8K | No |
| ModelScope | — | 2,000 | — | 131K | No |
| Mistral AI | 1 RPS | — | ~1B/mo | 128K | No |
| OVHcloud | — | — | — | 262K | No |
| Cloudflare | — | 10K Neurons | — | 128K | No |
| Hugging Face | — | ~1,000 | — | 131K | No |
| Kilo Code | — | ~200/hr | — | 262K | No |
| Z AI | — | — | — | 200K | No |
| Ollama Cloud | — | Unpublished | — | 128K | No |
Which Free API Key Should You Get? Decision Guide
Still not sure? Here's which one to pick based on what you're building:
- "I just want to try an LLM API, fast" → Google AI Studio. Sign up in 30 seconds. Gemini 2.5 Flash is one of the best free models available.
- "I need the fastest inference for coding agents" → Groq. Their LPU chips are dramatically faster than GPU providers.
- "I want access to GPT-4.1 / o3 / Grok 3 for free" → GitHub Models. The only place to get OpenAI's premium models free.
- "I want one key that works with everything" → OpenRouter. Single key, 35+ free models, OpenAI-compatible.
- "I need no daily token cap" → NVIDIA NIM. 40 RPM unlimited. Phone verification required.
- "I'm building for users in China" → SiliconFlow or ModelScope. Best latency in Asia.
- "I need GDPR-compliant EU hosting" → OVHcloud AI Endpoints. No registration needed.
- "I need embedding + rerank for RAG" → Cohere. Best-in-class Embed and Rerank APIs.
Frequently Asked Questions
How do I get a free LLM API key in 2026?
The fastest path: sign up for Google AI Studio (Gemini), Groq, or NVIDIA NIM. All three issue an API key instantly with just an email — no credit card. For the widest model selection, get an OpenRouter key (one key for 35+ free models). If you want GPT-4.1 or o3 for free, use GitHub Models (free for any GitHub account). Full step-by-step guides for all 20 providers are in this article.
Which free LLM APIs require no credit card?
19 of 20 providers on free-model.com require no credit card: OpenRouter, Groq, GitHub Models, NVIDIA NIM, Cloudflare Workers AI, Mistral AI, OVHcloud AI Endpoints, SiliconFlow, Cohere, Cerebras, Hugging Face, LLM7.io, Ollama Cloud, Kilo Code, Z AI (Zhipu AI), ModelScope, Google Gemini, Chutes.ai, Glhf.chat. Some need phone verification (Mistral, NVIDIA NIM) but never a credit card. You can filter for "no credit card" on our models page.
How do I get a free API key for GPT-4 or GPT-5?
OpenAI no longer offers free API credits. Use GitHub Models for free GPT-4.1, GPT-5, o3, and o4-mini (free for any GitHub account). Groq and Cerebras host free GPT-OSS models. LLM7.io offers free GPT-4o-mini. See the GitHub Models and LLM7.io sections in this guide.
How do I get a free Claude API key?
Anthropic does not offer a free Claude API tier. The only legitimate free access is via GitHub Models (limited monthly Claude budget). Alternatively, point Claude Code at a free OpenAI-compatible backend (Groq, NVIDIA NIM, OpenRouter) using our config generator.
How do I get a free Gemini API key?
Go to aistudio.google.com, sign in with a Google account, click "Get API Key" in the left menu. Instantly get a key — no credit card, no phone verification. Works for Gemini 2.5 Flash (1M context, multimodal) and Gemma models. 10 RPM, 250 RPD free.
Do free LLM API keys expire?
It depends. Google AI Studio keys are permanent. Groq keys are permanent on the free tier. OpenRouter free tier is permanent (200 RPD). Cohere Trial keys last 3 months. GitHub Models is tied to your GitHub account — permanent but rate-limited. Check each provider section below for specifics.
After You Get Your Key: What's Next?
Once you have an API key, here's how to start using it:
- Test it in the browser: Go to our playground, paste your key, pick your model, and chat. No install needed.
- Generate a config snippet: Use our config generator to get ready-to-copy config for Claude Code, Cursor, Codex, Aider, and more.
- Compare models side by side: Use the comparison tool to stack up context windows, rate limits, and modalities.
- Browse all models: Search, filter, and discover all 150+ free models on the models directory.
Something missing or outdated? This guide is maintained alongside our . Open an issue or PR if a provider changed their free tier. All model data is refreshed daily.