2026 Guide: How to Get a Free LLM API Key — Every Provider, Step by Step

Quick Reference: All 20 Free LLM API Providers

If you just want the signup link, here's every provider with free models, sorted by model count. Bookmark this table.

Provider	Free Models	Credit Card?	Get Your Key
OpenRouter	30	Not required	Sign up →
Groq	14	Not required	Sign up →
GitHub Models	13	Not required	Sign up →
NVIDIA NIM	11	Not required	Sign up →
Cloudflare Workers AI	10	Not required	Sign up →
Mistral AI	8	Not required	Sign up →
OVHcloud AI Endpoints	7	Not required	Sign up →
SiliconFlow	6	Not required	Sign up →
Cohere	5	Not required	Sign up →
Cerebras	5	Not required	Sign up →
Hugging Face	5	Not required	Sign up →
LLM7.io	5	Not required	Sign up →
Ollama Cloud	5	Not required	Sign up →
Kilo Code	4	Not required	Sign up →
Z AI (Zhipu AI)	3	Not required	Sign up →
ModelScope	3	Not required	Sign up →
Google Gemini	2	Not required	Sign up →
Chutes.ai	2	Not required	Sign up →
Glhf.chat	2	Not required	Sign up →
Grok (xAI)	2	Required	Sign up →

The Fastest Path: Get a Working API Key in 2 Minutes

If you just want to start building right now, here's the shortest path to a working free API key:

Go to aistudio.google.com/app/apikey
Sign in with your Google account (or create one — takes 30 seconds)
Click "Create API Key"
Copy the key. Done. No credit card. No phone number. No wait.

This gets you Gemini 2.5 Flash — 1 million token context, multimodal (text + image + audio + video), 10 requests/minute, 250 requests/day. Enough to build and test a full application.

Want more options? Read on for every provider's signup process, step by step.

Tier 1: No Credit Card — Instant Key with Just an Email

These 7 providers give you an API key the moment you sign up. No credit card. No phone. No waiting.

1. Google AI Studio — Free Gemini API Key

Sign up: aistudio.google.com/app/apikey → Rate limit: 10 RPM, 250 RPD (Gemini 2.5 Flash) Credit card: No Phone: No

How to get the key:

Go to aistudio.google.com
Sign in with a Google account
Click "Get API Key" in the left sidebar
Click "Create API Key" → copy it

What you get: Access to Gemini 2.5 Flash (1M context), Gemini 2.5 Flash-Lite, Gemma 3 models. Full multimodal — text, image, audio, video input. The 1M context window on Flash is unmatched at any price.

Fine print: Google may use free-tier prompts for product improvement. If you're building a commercial app, read their data policy. The API format is Google's own (not OpenAI-compatible), but wrappers exist.

Endpoint: https://generativelanguage.googleapis.com/v1beta

2. Groq — Fastest Free LLM Inference

Sign up: console.groq.com/keys → Rate limit: 30 RPM, 14,400 RPD for most models Credit card: No Phone: No

How to get the key:

Go to console.groq.com
Sign up with email or Google SSO
Go to "API Keys" in the left menu
Click "Create API Key" → name it → copy

What you get: Ultra-fast inference (~2,600 tok/s) on Llama 4 Scout/Maverick, Qwen3, DeepSeek-R1-Distill, Mistral, and Whisper models. Groq's LPU chips deliver the lowest latency of any free provider — great for interactive agents and real-time chat.

Fine print: Groq does not train on your data. Rate limits reset daily. Models are limited to ~8K context on the free tier. OpenAI-compatible endpoint.

Endpoint: https://api.groq.com/openai/v1

3. OpenRouter — One Key for 35+ Free Models

Sign up: openrouter.ai/keys → Rate limit: 200 RPD (free), 1,000 RPD with $10 lifetime credit Credit card: No (free tier) Phone: No

How to get the key:

Go to openrouter.ai
Sign in with Google or GitHub
Go to "Keys" → create a key
That's it. One key works for 35+ free models across Google, Meta, Mistral, DeepSeek, and more.

What you get: The widest free model selection of any provider — Llama 4, Qwen3, DeepSeek-V3, Gemma 3, Mistral Large, and 30+ others. All through a single OpenAI-compatible endpoint. Free models are marked with :free suffix in the model ID.

Fine print: 200 requests/day on the free tier (or 1,000/day with a one-time $10 top-up — the credit never expires). Some :free models return 403 in certain regions. OpenRouter pools multiple upstream providers, so model availability can fluctuate.

Endpoint: https://openrouter.ai/api/v1

4. NVIDIA NIM — 100+ Models, No Daily Token Cap

Sign up: build.nvidia.com → Rate limit: ~40 RPM, no daily token cap Credit card: No Phone: Yes (phone verification)

How to get the key:

Go to build.nvidia.com
Sign up for a NVIDIA Developer Program account (free)
Verify your phone number (required — they send an SMS code)
Go to any model page → "Get API Key"

What you get: The largest selection of open-weight models — 100+ including DeepSeek-V3, DeepSeek-R1, Llama 3.1 70B, Qwen3.5 397B, Nemotron-Super, Mistral Large 2, and domain-specific models. The key advantage: no daily token cap. You're limited only by RPM (~40), not by total daily usage.

Fine print: Phone verification is mandatory — the one friction point. Some models listed in the catalog aren't actually callable (we've tested and marked them). Model IDs use slash format — deepseek-ai/deepseek-v3 not deepseek-ai-deepseek-v3. See our NVIDIA NIM provider page for the full list.

Endpoint: https://integrate.api.nvidia.com/v1

5. GitHub Models — Free GPT-4.1, o3, Llama 4

Sign up: github.com/marketplace/models → Rate limit: Depends on Copilot tier (Free/Pro/Pro+/Business) Credit card: No Phone: No Requirement: GitHub account

How to get the key:

Make sure you have a GitHub account (free to create)
Go to github.com/marketplace/models
Browse to any model → click "Get API Key" or use the "Try in Playground" option
The API key is your GitHub personal access token (or use the built-in key in the playground)

What you get: This is the holy grail for many developers — free access to GPT-5, GPT-4.1, o3, o4-mini, and Grok 3 — models that are paid everywhere else. Plus Llama 4, DeepSeek-R1, Mistral Large, and Cohere Command A. 45+ models total. The catch is rate limits tied to your Copilot tier.

Fine print: Token limits per request are small (8K in / 4K out on free tier). Rate limits are unpublished and tied to your Copilot subscription. This is great for prototyping and testing, not for production workloads. Azure-hosted (endpoint is models.inference.ai.azure.com).

Endpoint: https://models.inference.ai.azure.com

6. Cerebras — Ultra-Fast on WSE Chips

Sign up: cloud.cerebras.ai → Rate limit: 1M tokens/day, 14,400 RPD Credit card: No Phone: No

How to get the key:

Go to cloud.cerebras.ai
Sign up with email or Google
Go to "API Keys" → create a key

What you get: Llama 3.1 8B and GPT-OSS 120B running on Cerebras WSE (Wafer-Scale Engine) chips — some of the fastest AI hardware available. Good for high-throughput batch inference where latency matters.

Fine print: Context limited to 8K on the free tier. Smaller model selection than Groq or OpenRouter. OpenAI-compatible endpoint.

Endpoint: https://api.cerebras.ai/v1

7. Z AI (Zhipu) — GLM Series, No Credit Card

Sign up: open.bigmodel.cn → Rate limit: 1 concurrent request Credit card: No Phone: Chinese phone may be required

How to get the key:

Go to open.bigmodel.cn
Sign up (may require Chinese phone number for full access)
Go to "API Keys" in the user center
Create a key

What you get: GLM-4.7-Flash — Zhipu's latest model with 200K context. Bilingual Chinese/English. Good for Chinese-language applications and document processing. GLM-4.7 handles reasoning tasks well.

Fine print: 1 concurrent request limit is the tightest of any provider — expect queueing if multiple clients share the same key. Interface is primarily in Chinese. OpenAI-compatible endpoint.

Endpoint: https://open.bigmodel.cn/api/paas/v4

Tier 2: No Credit Card — But Some Verification Required

These providers don't charge your card, but they want a phone number, a Cloudflare account, or other verification before issuing a key.

8. Mistral AI — ~1B Tokens/Month Free

Sign up: console.mistral.ai → Rate limit: ~1 RPS, ~1B tokens/month Credit card: No Phone: Yes (phone verification)

How to get the key:

Go to console.mistral.ai
Sign up with email
Verify your phone number (they send an SMS)
Select the "Experiment" plan (free)
Go to "API Keys" → create a key

What you get: Access to Mistral Small, Medium, Large, and Codestral — Mistral's code-optimized model. ~1 billion tokens/month on the Experiment plan. All models are OpenAI-compatible. Mistral's models are strong for European languages and coding.

Fine print: Phone verification is required — no way around it. Free-tier data may be used for model improvement unless you explicitly opt out in settings. 1 RPS is the lowest concurrency of any major provider — not suitable for multi-user apps.

Endpoint: https://api.mistral.ai/v1

9. Cloudflare Workers AI — Edge Inference, 10K Neurons/Day

Sign up: dash.cloudflare.com → Rate limit: 10,000 Neurons/day Credit card: No Phone: No Requirement: Cloudflare account

How to get the key:

Go to dash.cloudflare.com
Create a Cloudflare account (free)
Go to "AI" → "Workers AI" in the dashboard
Use the REST API endpoint with your Cloudflare API token

What you get: 50+ models running on Cloudflare's global edge network — low latency from anywhere in the world. Llama, Mistral, Gemma, DeepSeek-R1-Distill, Qwen, and BGE embedding models. Text, image, audio, and embedding modalities.

Fine print: Billing is based on "Neurons" (Cloudflare's compute unit), not tokens — harder to predict costs/limits. Some models require a paid Workers plan ($5/month). The API format is Cloudflare-specific, not OpenAI-compatible by default.

Endpoint: https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run

10. Cohere — Command A + Embed, 1,000 Calls/Month

Sign up: dashboard.cohere.com → Rate limit: 20 RPM, 1,000 API calls/month Credit card: No Phone: No

How to get the key:

Go to dashboard.cohere.com
Sign up with email or Google
Go to "API Keys" → create a Trial key

What you get: Command A (111B parameters), Command R, Command R+, Aya Expanse (multilingual), Embed v4 (embedding), and Rerank models. Cohere's models are particularly strong for RAG (retrieval-augmented generation) workflows — their Embed and Rerank APIs are industry-leading.

Fine print: Trial key is non-commercial only. Limited to 1,000 API calls/month across all models — one of the tighter limits. The key expires after a trial period (typically 3 months).

Endpoint: https://api.cohere.com/v2

11. Hugging Face Inference API — Rotating Open Models

Sign up: huggingface.co/settings/tokens → Rate limit: ~1,000 RPD Credit card: No Phone: No

How to get the key:

Go to huggingface.co
Create a free account
Go to Settings → Access Tokens
Create a token (read-only is sufficient for inference)

What you get: Access to a rotating selection of open-weight models through the Hugging Face Serverless Inference API — Qwen, Llama, Gemma, SmolLM. The model lineup changes as Hugging Face updates their free tier.

Fine print: Not OpenAI-compatible — uses the Hugging Face Inference API format. Shared infrastructure means latency can vary significantly. Rate limits are not precisely documented (~1,000 requests/day approximate). Better for experimentation than production.

Endpoint: https://api-inference.huggingface.co/models

12. OVHcloud AI Endpoints — No Registration at All

Sign up: endpoints.ai.cloud.ovh.net → Rate limit: 2 RPM (anonymous), higher with registration Credit card: No Phone: No Unique: Anonymous tier — no account needed

How to get the key:

Option A (anonymous): No key needed. Just send requests to the endpoint. 2 RPM limit.
Option B (registered): Create an account at OVHcloud AI Endpoints for higher rate limits. Free registration.

What you get: EU-hosted inference for Qwen3-Coder, Mistral, Llama, DeepSeek, and other open models. GDPR compliant. The anonymous tier is unique — no other provider lets you call models with zero registration.

Fine print: 2 RPM on the anonymous tier is very low. Registered tier has better limits but OVHcloud doesn't publish exact numbers. EU latency is excellent; outside Europe may be slower. OpenAI-compatible.

Endpoint: https://qwen3-coder-30b-a3b-instruct.endpoints.ovh.net/v1 (model-specific)

Tier 3: Regional Providers — Best for China / Asia-Pacific

These Chinese providers offer generous free tiers but are optimized for users in China and nearby regions. If you're building for the Chinese market, these are your best options.

13. SiliconFlow — 1,000 RPM Free

Sign up: cloud.siliconflow.cn → Rate limit: 1,000 RPM, 50K TPM Credit card: No Phone: Chinese phone may be required

How to get the key:

Go to cloud.siliconflow.cn
Register (may require Chinese phone number)
Go to "API Keys" → create a key

What you get: DeepSeek-V3, DeepSeek-R1, Qwen3, Qwen2.5, BGE embeddings, and Stable Diffusion models. The 1,000 RPM rate limit is the most generous of any free tier. Perfect for high-throughput applications.

Fine print: Registration may require a Chinese phone number. Latency is excellent in China/Asia but higher in US/Europe. Interface is in Chinese. OpenAI-compatible endpoint.

Endpoint: https://api.siliconflow.cn/v1

14. ModelScope — Alibaba's Model Hub

Sign up: modelscope.cn → Rate limit: 2,000 RPD total, 500 RPD per model Credit card: No Phone: Chinese phone may be required

How to get the key:

Go to modelscope.cn
Register for an account
Go to "Access Token" in user settings
Create a token

What you get: Alibaba's Qwen3.5 models (including 35B-A3B MoE), DeepSeek models, and other Chinese LLMs. 2,000 requests/day across all models. The Qwen family is particularly strong for coding and Chinese-language tasks.

Fine print: 500 RPD cap per individual model. Registration may require Chinese credentials. Interface in Chinese. OpenAI-compatible endpoint.

Endpoint: https://api-inference.modelscope.cn/v1

Tier 4: Niche Providers — Specialized Use Cases

These providers don't compete on breadth. They solve specific problems — coding gateways, rare free models, or specialized APIs.

15. Kilo Code — Coding-Optimized Gateway

Sign up: kilo.ai → Rate limit: ~200 req/hr Credit card: No Phone: No

How to get the key:

Go to kilo.ai
Sign up — the API key works for all models on the gateway

What you get: Kilo Code is a coding-specific model router — it directs your request to the best coding model available (ByteDance Seed, Grok Code Fast, NVIDIA Nemotron, Arcee Trinity). Purpose-built for AI code editors like its VS Code extension.

Fine print: Models are routed dynamically — you don't control which model handles your request. ~200 req/hr is per-IP. OpenAI-compatible.

Endpoint: https://api.kilo.ai/api/gateway

16. LLM7.io — Free GPT-4o-mini

Sign up: token.llm7.io → Rate limit: 30 RPM (120 RPM with token registration) Credit card: No Phone: No

How to get the key:

Go to token.llm7.io
Follow the token registration process
Use the token as your API key

What you get: Free GPT-4o-mini — one of the only places to get OpenAI's own models on a free tier. Also DeepSeek-R1, Qwen, and Llama. The free GPT-4o-mini access is the main draw here.

Fine print: Independent aggregator — reliability and uptime are not guaranteed. Token registration process is less polished than bigger providers. OpenAI-compatible.

Endpoint: https://api.llm7.io/v1

17. Ollama Cloud — Hosted Ollama Runtime

Sign up: ollama.com/settings/keys → Rate limit: Session/weekly limits (unpublished) Credit card: No Phone: No

How to get the key:

Go to ollama.com
Create an account
Go to Settings → API Keys
Generate a key

What you get: Llama, Qwen, and Gemma models through the Ollama-native API. If you already use Ollama locally, the cloud version works with the same tooling. OpenAI-compatible wrapper available.

Fine print: Rate limits are unpublished — expect them to be lower than major providers. Best for developers already in the Ollama ecosystem who want a zero-config cloud fallback.

Endpoint: https://api.ollama.com

Head-to-Head: Rate Limits Comparison

Provider	RPM	RPD	Tokens/Day	Max Context	Credit Card
SiliconFlow	1,000	—	50K TPM	131K	No
NVIDIA NIM	~40	No cap	No cap	1M	No
Groq	30	14,400	—	8K	No
LLM7.io	30	—	—	131K	No
Cohere	20	1,000/mo	—	128K	No
Google AI Studio	10	250	—	1M	No
GitHub Models	Varies	Varies	~8K/req	128K	No
OpenRouter	—	200	—	1M	No
Cerebras	—	14,400	1M	8K	No
ModelScope	—	2,000	—	131K	No
Mistral AI	1 RPS	—	~1B/mo	128K	No
OVHcloud	—	—	—	262K	No
Cloudflare	—	10K Neurons	—	128K	No
Hugging Face	—	~1,000	—	131K	No
Kilo Code	—	~200/hr	—	262K	No
Z AI	—	—	—	200K	No
Ollama Cloud	—	Unpublished	—	128K	No

Which Free API Key Should You Get? Decision Guide

Still not sure? Here's which one to pick based on what you're building:

"I just want to try an LLM API, fast" → Google AI Studio. Sign up in 30 seconds. Gemini 2.5 Flash is one of the best free models available.
"I need the fastest inference for coding agents" → Groq. Their LPU chips are dramatically faster than GPU providers.
"I want access to GPT-4.1 / o3 / Grok 3 for free" → GitHub Models. The only place to get OpenAI's premium models free.
"I want one key that works with everything" → OpenRouter. Single key, 35+ free models, OpenAI-compatible.
"I need no daily token cap" → NVIDIA NIM. 40 RPM unlimited. Phone verification required.
"I'm building for users in China" → SiliconFlow or ModelScope. Best latency in Asia.
"I need GDPR-compliant EU hosting" → OVHcloud AI Endpoints. No registration needed.
"I need embedding + rerank for RAG" → Cohere. Best-in-class Embed and Rerank APIs.

Frequently Asked Questions

How do I get a free LLM API key in 2026?

The fastest path: sign up for Google AI Studio (Gemini), Groq, or NVIDIA NIM. All three issue an API key instantly with just an email — no credit card. For the widest model selection, get an OpenRouter key (one key for 35+ free models). If you want GPT-4.1 or o3 for free, use GitHub Models (free for any GitHub account). Full step-by-step guides for all 20 providers are in this article.

Which free LLM APIs require no credit card?

19 of 20 providers on free-model.com require no credit card: OpenRouter, Groq, GitHub Models, NVIDIA NIM, Cloudflare Workers AI, Mistral AI, OVHcloud AI Endpoints, SiliconFlow, Cohere, Cerebras, Hugging Face, LLM7.io, Ollama Cloud, Kilo Code, Z AI (Zhipu AI), ModelScope, Google Gemini, Chutes.ai, Glhf.chat. Some need phone verification (Mistral, NVIDIA NIM) but never a credit card. You can filter for "no credit card" on our models page.

How do I get a free API key for GPT-4 or GPT-5?

OpenAI no longer offers free API credits. Use GitHub Models for free GPT-4.1, GPT-5, o3, and o4-mini (free for any GitHub account). Groq and Cerebras host free GPT-OSS models. LLM7.io offers free GPT-4o-mini. See the GitHub Models and LLM7.io sections in this guide.

How do I get a free Claude API key?

Anthropic does not offer a free Claude API tier. The only legitimate free access is via GitHub Models (limited monthly Claude budget). Alternatively, point Claude Code at a free OpenAI-compatible backend (Groq, NVIDIA NIM, OpenRouter) using our config generator.

How do I get a free Gemini API key?

Go to aistudio.google.com, sign in with a Google account, click "Get API Key" in the left menu. Instantly get a key — no credit card, no phone verification. Works for Gemini 2.5 Flash (1M context, multimodal) and Gemma models. 10 RPM, 250 RPD free.

Do free LLM API keys expire?

It depends. Google AI Studio keys are permanent. Groq keys are permanent on the free tier. OpenRouter free tier is permanent (200 RPD). Cohere Trial keys last 3 months. GitHub Models is tied to your GitHub account — permanent but rate-limited. Check each provider section below for specifics.

After You Get Your Key: What's Next?

Once you have an API key, here's how to start using it:

Test it in the browser: Go to our playground, paste your key, pick your model, and chat. No install needed.
Generate a config snippet: Use our config generator to get ready-to-copy config for Claude Code, Cursor, Codex, Aider, and more.
Compare models side by side: Use the comparison tool to stack up context windows, rate limits, and modalities.
Browse all models: Search, filter, and discover all 150+ free models on the models directory.

Something missing or outdated? This guide is maintained alongside our . Open an issue or PR if a provider changed their free tier. All model data is refreshed daily.