AI API Cost Calculator

What Is Batch API Pricing? OpenAI and Anthropic Batch Explained

← Calculate API costs

Batch APIs: 50% Off for Async Workloads

Both OpenAI and Anthropic offer batch processing APIs that price asynchronous requests at 50% off standard rates. Instead of sending individual real-time API calls that return results immediately, you submit a batch of requests as a file, and the provider processes them in the background — typically within 24 hours.

For developers running large workloads where real-time response isn't required, batch APIs are one of the most straightforward cost reductions available. No prompt changes, no model changes, no quality trade-off — just half the price for the same output, delivered on a delay.

// Batch savings example

100,000 GPT-4o calls at 500 tokens input / 500 tokens output = $625/month at standard pricing. Same volume via Batch API = $312.50/month. Annual savings: $3,750 for this single workload. For high-volume pipelines running multiple models, savings compound quickly.

OpenAI Batch API

OpenAI's Batch API was launched in 2024 and is available for all GPT-4o and GPT-4o mini models. The mechanics:

Each request in the batch follows the standard Chat Completions format with an additional custom_id field for tracking. The output file maps each custom_id back to its response, making it straightforward to process results programmatically.

Limits: each batch file can contain up to 50,000 requests and 100MB. Completion window is 24 hours — OpenAI guarantees results within that window or refunds the batch.

# Example batch input JSONL (one line per request) {"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Classify this review: Great product!"}], "max_tokens": 10}} {"custom_id": "req-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Classify this review: Terrible, broke immediately."}], "max_tokens": 10}}

Anthropic Batch API

Anthropic's Message Batches API is available for all Claude models. It uses a similar pattern to OpenAI's implementation:

Anthropic's batch API has a 24-hour processing guarantee. Pricing is 50% off standard model rates for all Claude models. The batch API also supports prompt caching, so if your requests share a common system prompt, caching savings stack on top of the batch discount.

# Anthropic batch request format { "requests": [ { "custom_id": "req-001", "params": { "model": "claude-haiku-3-5-20241022", "max_tokens": 10, "messages": [{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}] } } ] }

When Batch APIs Make Sense

Batch processing is ideal for any workload where you can tolerate results arriving hours rather than milliseconds after submission:

A useful rule of thumb: if a workload can wait until morning, it can probably use batch pricing. Many production AI pipelines that feel real-time are actually triggered by user events but don't need to return results within seconds — these are prime batch candidates.

When Batch APIs Don't Make Sense

Batch processing is not appropriate for:

The 24-hour window is the key constraint. If your SLA or user expectation requires faster turnaround than that, batch isn't viable. For these workloads, focus on model selection and prompt optimization instead.

Comparison: Standard vs Batch Pricing

ModelStandard Input (1M)Batch Input (1M)Standard Output (1M)Batch Output (1M)
GPT-4o$2.50$1.25$10.00$5.00
GPT-4o mini$0.15$0.075$0.60$0.30
Claude Sonnet 3.7$3.00$1.50$15.00$7.50
Claude Haiku 3.5$0.80$0.40$4.00$2.00

Prices shown are approximate and subject to change. Verify current rates before modeling costs.