Llama-4-Maverick-17B-128E — Free AI Model & API

github-models/llama-4-maverick-17b-128e
chat
Context Window 256K
Max Output 4K
Rate Limit 10 RPM, 50 RPD
Cost $0.00 FREE
Free Period Since May 10, 2026
Credit Card Not required
Status Online

Overview

Llama 4 Maverick 17B is Meta's high-expert-count MoE model with 128 active experts, available for free on GitHub Models. Despite its compact 17B active parameter footprint, the large expert pool gives it broad knowledge coverage — making it a strong general-purpose chat and instruction-following model that rivals much larger dense architectures. The free tier limits output to 4K tokens per request with 10 RPM and 50 requests per day, so it is best suited for interactive chat and short-form generation rather than long-form writing. Fully OpenAI SDK-compatible; requires only a GitHub account to start using.

Model ID
llama-4-maverick-17b-128e
Base URL
https://models.inference.ai.azure.com
Specifications
Context: 256K · Output: 4K · Modality: text · OpenAI Compat: Yes

Quick Start

Integrate Llama-4-Maverick-17B-128E with 3 lines of code. See the config generator for Claude Code, Cursor, and more.

from openai import OpenAI

client = OpenAI(
 base_url="https://models.inference.ai.azure.com",
 api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
 model="llama-4-maverick-17b-128e",
 messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const openai = new OpenAI({
 baseURL: "https://models.inference.ai.azure.com",
 apiKey: "YOUR_API_KEY",
});

const completion = await openai.chat.completions.create({
 model: "llama-4-maverick-17b-128e",
 messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);
curl https://models.inference.ai.azure.com/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer YOUR_API_KEY" \
 -d '{
 "model": "llama-4-maverick-17b-128e",
 "messages": [{"role": "user", "content": "Hello!"}]
 }'

Other Free Models from GitHub Models

Rate Limits & Constraints

Rate Limit 10 RPM, 50 RPD
Context Window 256K
Max Output Tokens 4K
Cost Free — since May 10, 2026
Credit Card Not required
OpenAI Compatible Yes — drop-in replacement

GitHub Models Platform Limitations

  • Low per-request token limits (8K input / 4K output)
  • Rate limits tied to GitHub Copilot subscription tier
  • Not suitable for large-context or long-generation tasks

Features & Use Cases

Best For

Chat

Modality Support

text

GitHub Models Highlights

  • 45+ models including GPT-4.1 and o3
  • Free for all GitHub accounts
  • Includes Llama 4, DeepSeek-R1, Mistral
  • Base URL: models.inference.ai.azure.com

Playground — Test Llama-4-Maverick-17B-128E

Test Llama-4-Maverick-17B-128E directly in your browser. Your API key is sent directly to GitHub Models — never stored.

Model: Llama-4-Maverick-17B-128E Get Key

🔒 Your key is never stored — sent directly to the model provider via our server proxy.

Ready to chat with Llama-4-Maverick-17B-128E.

Frequently Asked Questions

How do I get an API key for Llama-4-Maverick-17B-128E?

Sign up at GitHub Models to get your API key. No credit card is required — just an email sign-up. Once you have the key, use the code snippets in the Quick Start section above.

Is Llama-4-Maverick-17B-128E really free?

Yes. Llama-4-Maverick-17B-128E is available on GitHub Models's free tier and has been free since May 10, 2026. Rate limits apply: 10 RPM, 50 RPD. Always check the provider's terms for any changes to the free tier.

What are Llama-4-Maverick-17B-128E's rate limits?

10 RPM, 50 RPD Context window: 256K. Max output: 4K. No credit card required.

What are the best free alternatives to Llama-4-Maverick-17B-128E?

Popular free alternatives include inclusionAI: Ring-2.6-1T, Baidu Qianfan: CoBuddy (free), Owl Alpha. You can also browse all 147+ free models on our site.

More questions? See our full FAQ →

Similar Free Models