How to Get a Free Cerebras API Key (2026)
5 free models available — no credit card required. Get your Cerebras API key →
Overview
Ultra-fast inference on Cerebras WSE chips — 1M tokens/day.
Cerebras Cloud offers free API access to Llama and GPT-OSS models running on the Cerebras Wafer-Scale Engine, one of the fastest AI accelerators available. The free tier provides 1 million tokens/day and 14,400 requests/day per model with no credit card required. Context window is limited to 8K on the free tier.
- Ultra-fast inference on WSE chips
- 1M tokens/day free
- No credit card required
- Llama 3.1 8B + GPT-OSS 120B available
API Compatibility: OpenAI SDK-compatible (Chat Completions)
Quick Start Guide
- 1 Sign up at cloud.cerebras.ai Email or GitHub. No credit card.
- 2 Go to API Keys
- 3 Generate an API key
- 4 Choose a model Llama 3.3 70B or GPT-OSS 120B available for free.
- 5 Configure OpenAI client Base URL: https://api.cerebras.ai/v1
All Free Cerebras Models — Context Windows & Rate Limits
| Model | Context | Max Output | Modality | Rate Limit | Released | Status | |
|---|---|---|---|---|---|---|---|
| llama3.1-8b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | — | Online | Details | |
| gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | — | Online | Details | |
| qwen-3-235b-a22b-instruct-2507 | 131K | 8K | 30 RPM, 14,400 RPD, 1M TPD | — | Online | Details | |
| zai-glm-4.7 | 128K | 8K | 10 RPM, 100 RPD, 1M TPD | — | Online | Details | |
| Llama 3.1 70B | 131K | 0 | See provider page | — | Online | Details |
Pricing & Limits
Credit Card Not required
Free Tier Permanently free
Context Range 128K – 131K
Total Models 5 free
Rate Limits 30 RPM, 14,400 RPD, 1M TPD · 10 RPM, 100 RPD, 1M TPD
API Compatibility OpenAI SDK-compatible (Chat Completions)
Use Cases
What Cerebras's free models are best for, based on aggregated model capabilities:
Limitations & Caveats
- 8K context window on free tier (vs 128K on paid)
- Limited model selection — Llama and GPT-OSS only
- 1M tokens/day shared across models