What Is Batch API Pricing? OpenAI and Anthropic Batch Explained
Batch APIs: 50% Off for Async Workloads
Both OpenAI and Anthropic offer batch processing APIs that price asynchronous requests at 50% off standard rates. Instead of sending individual real-time API calls that return results immediately, you submit a batch of requests as a file, and the provider processes them in the background — typically within 24 hours.
For developers running large workloads where real-time response isn't required, batch APIs are one of the most straightforward cost reductions available. No prompt changes, no model changes, no quality trade-off — just half the price for the same output, delivered on a delay.
// Batch savings example
100,000 GPT-4o calls at 500 tokens input / 500 tokens output = $625/month at standard pricing. Same volume via Batch API = $312.50/month. Annual savings: $3,750 for this single workload. For high-volume pipelines running multiple models, savings compound quickly.
OpenAI Batch API
OpenAI's Batch API was launched in 2024 and is available for all GPT-4o and GPT-4o mini models. The mechanics:
- Format your requests as a JSONL file, one request object per line
- Upload the file via the Files API
- Create a batch job referencing the file ID
- Poll for completion (or use webhooks when available)
- Download the output JSONL file and parse results
Each request in the batch follows the standard Chat Completions format with an additional custom_id field for tracking. The output file maps each custom_id back to its response, making it straightforward to process results programmatically.
Limits: each batch file can contain up to 50,000 requests and 100MB. Completion window is 24 hours — OpenAI guarantees results within that window or refunds the batch.
Anthropic Batch API
Anthropic's Message Batches API is available for all Claude models. It uses a similar pattern to OpenAI's implementation:
- Submit a list of request objects (up to 10,000 per batch, 32MB limit)
- Each request includes a
custom_idand a standard Messages API body - Poll the batch endpoint for status or implement a polling loop
- Stream results as they complete, or download the full result file
Anthropic's batch API has a 24-hour processing guarantee. Pricing is 50% off standard model rates for all Claude models. The batch API also supports prompt caching, so if your requests share a common system prompt, caching savings stack on top of the batch discount.
When Batch APIs Make Sense
Batch processing is ideal for any workload where you can tolerate results arriving hours rather than milliseconds after submission:
- Data enrichment pipelines: Classifying, tagging, or summarizing large datasets overnight
- Content generation at scale: Generating product descriptions, SEO content, or email variants in bulk
- Embedding generation: Creating vector embeddings for large document libraries
- Evaluation and testing: Running eval suites against your prompt library on a schedule
- Scheduled analytics: Nightly summarization of logs, support tickets, or user feedback
- Offline annotation: Labeling training data for model fine-tuning
A useful rule of thumb: if a workload can wait until morning, it can probably use batch pricing. Many production AI pipelines that feel real-time are actually triggered by user events but don't need to return results within seconds — these are prime batch candidates.
When Batch APIs Don't Make Sense
Batch processing is not appropriate for:
- Conversational AI and chatbots (real-time by definition)
- Any user-facing feature where the user is waiting for a response
- Low-latency pipelines where processing must complete in seconds
- Interactive tools where iteration depends on seeing the previous result
The 24-hour window is the key constraint. If your SLA or user expectation requires faster turnaround than that, batch isn't viable. For these workloads, focus on model selection and prompt optimization instead.
Comparison: Standard vs Batch Pricing
| Model | Standard Input (1M) | Batch Input (1M) | Standard Output (1M) | Batch Output (1M) |
|---|---|---|---|---|
| GPT-4o | $2.50 | $1.25 | $10.00 | $5.00 |
| GPT-4o mini | $0.15 | $0.075 | $0.60 | $0.30 |
| Claude Sonnet 3.7 | $3.00 | $1.50 | $15.00 | $7.50 |
| Claude Haiku 3.5 | $0.80 | $0.40 | $4.00 | $2.00 |
Prices shown are approximate and subject to change. Verify current rates before modeling costs.