What Is Batch API Pricing? OpenAI and Anthropic Batch Explained

For developers and AI builders · 7 min read

Batch APIs: 50% Off for Async Workloads

Both OpenAI and Anthropic offer batch processing APIs that price asynchronous requests at 50% off standard rates. Instead of sending individual real-time API calls that return results immediately, you submit a batch of requests as a file, and the provider processes them in the background — typically within 24 hours.

For developers running large workloads where real-time response isn't required, batch APIs are one of the most straightforward cost reductions available. No prompt changes, no model changes, no quality trade-off — just half the price for the same output, delivered on a delay.

// Batch savings example

100,000 GPT-4o calls at 500 tokens input / 500 tokens output = $625/month at standard pricing. Same volume via Batch API = $312.50/month. Annual savings: $3,750 for this single workload. For high-volume pipelines running multiple models, savings compound quickly.

OpenAI Batch API

OpenAI's Batch API was launched in 2024 and is available for all GPT-4o and GPT-4o mini models. The mechanics:

Format your requests as a JSONL file, one request object per line
Upload the file via the Files API
Create a batch job referencing the file ID
Poll for completion (or use webhooks when available)
Download the output JSONL file and parse results

Each request in the batch follows the standard Chat Completions format with an additional custom_id field for tracking. The output file maps each custom_id back to its response, making it straightforward to process results programmatically.

Limits: each batch file can contain up to 50,000 requests and 100MB. Completion window is 24 hours — OpenAI guarantees results within that window or refunds the batch.

# Example batch input JSONL (one line per request)
{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Classify this review: Great product!"}], "max_tokens": 10}}
{"custom_id": "req-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Classify this review: Terrible, broke immediately."}], "max_tokens": 10}}
    

Anthropic Batch API

Anthropic's Message Batches API is available for all Claude models. It uses a similar pattern to OpenAI's implementation:

Submit a list of request objects (up to 10,000 per batch, 32MB limit)
Each request includes a custom_id and a standard Messages API body
Poll the batch endpoint for status or implement a polling loop
Stream results as they complete, or download the full result file

Anthropic's batch API has a 24-hour processing guarantee. Pricing is 50% off standard model rates for all Claude models. The batch API also supports prompt caching, so if your requests share a common system prompt, caching savings stack on top of the batch discount.

# Anthropic batch request format
{
  "requests": [
    {
      "custom_id": "req-001",
      "params": {
        "model": "claude-haiku-3-5-20241022",
        "max_tokens": 10,
        "messages": [{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}]
      }
    }
  ]
}
    

When Batch APIs Make Sense

Batch processing is ideal for any workload where you can tolerate results arriving hours rather than milliseconds after submission:

Data enrichment pipelines: Classifying, tagging, or summarizing large datasets overnight
Content generation at scale: Generating product descriptions, SEO content, or email variants in bulk
Embedding generation: Creating vector embeddings for large document libraries
Evaluation and testing: Running eval suites against your prompt library on a schedule
Scheduled analytics: Nightly summarization of logs, support tickets, or user feedback
Offline annotation: Labeling training data for model fine-tuning

A useful rule of thumb: if a workload can wait until morning, it can probably use batch pricing. Many production AI pipelines that feel real-time are actually triggered by user events but don't need to return results within seconds — these are prime batch candidates.

When Batch APIs Don't Make Sense

Batch processing is not appropriate for:

Conversational AI and chatbots (real-time by definition)
Any user-facing feature where the user is waiting for a response
Low-latency pipelines where processing must complete in seconds
Interactive tools where iteration depends on seeing the previous result

The 24-hour window is the key constraint. If your SLA or user expectation requires faster turnaround than that, batch isn't viable. For these workloads, focus on model selection and prompt optimization instead.

Comparison: Standard vs Batch Pricing

Model	Standard Input (1M)	Batch Input (1M)	Standard Output (1M)	Batch Output (1M)
GPT-4o	$2.50	$1.25	$10.00	$5.00
GPT-4o mini	$0.15	$0.075	$0.60	$0.30
Claude Sonnet 3.7	$3.00	$1.50	$15.00	$7.50
Claude Haiku 3.5	$0.80	$0.40	$4.00	$2.00

Prices shown are approximate and subject to change. Verify current rates before modeling costs.