Best LLM for Document Summarisation (2026)

Bottom line up front: For document summarisation, Claude Sonnet 4.6 produces the most faithful, well-structured summaries with the lowest hallucination rate. Gemini 2.5 Pro is the better choice for very long documents that exceed 200K tokens. GPT-4o is a strong alternative for teams that need structured JSON output from their summaries.

What makes a good summarisation LLM

Summarisation seems simple but it exposes model weaknesses quickly:

Faithfulness — the model must summarise what the document says, not what it thinks the document probably says. Hallucination in summaries is often subtle and difficult to catch
Context window — you need the model to read the full document in a single pass. Chunking and stitching summaries degrades quality and introduces inconsistencies
Instruction following — you need to specify format, length, tone, and focus area. Models that drift from these instructions produce summaries that require editing
Compression ratio — the ability to distil a 50-page document into 3 coherent paragraphs without losing critical information. Not all models do this equally well

Top recommendations

1. Claude Sonnet 4.6 — Best overall for summarisation

Provider: Anthropic

Cost: $3.00 / 1M input tokens · $15.00 / 1M output tokens

Context window: 200,000 tokens

Best for: Faithful, well-structured summaries of documents up to ~150K tokens

Claude Sonnet 4.6 is the best summarisation model available for most document lengths. Its training produces unusually clean compression — it identifies the most important content reliably and presents it in well-formed prose that requires minimal editing. Critically, it has one of the lowest hallucination rates of any frontier model, which is non-negotiable for summaries that will be acted upon.

The 200K context window handles most real-world documents: a 150,000-token limit covers approximately 110,000 words or about 400 pages.

View Anthropic API docs →

2. Gemini 2.5 Pro — Best for very long documents

Provider: Google

Cost: $1.25 / 1M input tokens · $10.00 / 1M output tokens

Context window: 1,000,000 tokens

Best for: Documents exceeding 200K tokens — large codebases, full legal case files, lengthy research corpora

Gemini 2.5 Pro's 1M token context window is the primary reason to choose it over Claude for summarisation. If you are processing documents that exceed 200K tokens — full books, complete legal discovery files, large database exports — Gemini 2.5 Pro is the only frontier model that can handle the entire document in a single pass.

Its summarisation quality is slightly below Claude Sonnet 4.6, particularly on documents that require subtle inference or nuanced synthesis. But for documents where the primary challenge is length rather than complexity, it is the correct choice.

It is also significantly cheaper than Claude Sonnet 4.6 on input tokens ($1.25 vs $3.00/M), which matters substantially for very large documents.

View Google AI docs →

3. GPT-4o — Best for structured output summarisation

Provider: OpenAI

Cost: $2.50 / 1M input tokens · $10.00 / 1M output tokens

Context window: 128,000 tokens

Best for: Pipelines requiring structured JSON output from summaries

GPT-4o is the best choice when your summarisation pipeline needs structured output — extracting specific fields, producing JSON with defined keys, or populating a database schema from document content. Its structured output capability is the most reliable of the three models for this pattern.

Its 128K context window is a real limitation for long documents. A 128K context handles approximately 95,000 words — sufficient for most business documents but inadequate for lengthy legal or technical documents.

View OpenAI API docs →

4. Gemini 2.0 Flash — Best budget option

Provider: Google

Cost: $0.10 / 1M input tokens · $0.40 / 1M output tokens

Context window: 1,000,000 tokens

Best for: High-volume summarisation where cost is the primary constraint

Gemini 2.0 Flash produces surprisingly good summaries for its price point. For internal summarisation pipelines where summaries are used as inputs to downstream processes (rather than presented directly to users), its quality is often sufficient.

At $0.10/M input, summarising a 50,000-token document costs $0.005 — essentially free at moderate volumes. For high-volume batch summarisation jobs, it is the clear cost winner.

Side-by-side comparison

Model	Input $/M	Output $/M	Context	Faithfulness	Compression
Gemini 2.0 Flash	$0.10	$0.40	1M	★★★☆☆	★★★☆☆
Gemini 2.5 Pro	$1.25	$10.00	1M	★★★★☆	★★★★☆
GPT-4o	$2.50	$10.00	128K	★★★★☆	★★★★☆
Claude Sonnet 4.6	$3.00	$15.00	200K	★★★★★	★★★★★

Cost per document — real-world estimates

Document size	Model	Approx. cost
10-page report (~7,500 tokens)	Gemini 2.0 Flash	$0.001
10-page report (~7,500 tokens)	Claude Sonnet 4.6	$0.023
100-page report (~75,000 tokens)	Gemini 2.0 Flash	$0.008
100-page report (~75,000 tokens)	Claude Sonnet 4.6	$0.23
400-page book (~300,000 tokens)	Gemini 2.5 Pro	$0.38
400-page book (~300,000 tokens)	Claude Sonnet 4.6	Not possible (exceeds context)

FAQ

What is the best LLM for summarising long documents?

For documents up to 200K tokens, Claude Sonnet 4.6 produces the best summaries. For documents exceeding 200K tokens, Gemini 2.5 Pro is the only frontier model with a context window large enough to process them in a single pass.

Which LLM hallucinates least in summaries?

Claude Sonnet 4.6 has the lowest measured hallucination rate for summarisation tasks. This is particularly important for legal, financial, and medical document summarisation where factual accuracy is critical.

Can I summarise a full book with an LLM?

Yes, with the right model. Gemini 2.5 Pro and Gemini 2.0 Flash both offer 1M token context windows, which can accommodate approximately 700,000 words — sufficient for most books. Claude Sonnet 4.6 handles up to approximately 150,000 words in a single pass.

Is chunking documents and combining summaries a good approach?

It is a workaround, not a solution. Chunked summarisation loses cross-section relationships and produces inconsistent output. Where possible, use a model with a context window large enough to process the full document.

Last verified: April 2026 · Back to LLM Selector