Learn

What is a token in AI?

Tokens are the fundamental unit of text in large language models. Understanding them helps you write better prompts, control costs, and avoid hitting context limits.

What is a token in AI?

A tokenis a chunk of text that an AI language model reads and generates one piece at a time. Tokens are not the same as words, characters, or syllables — they are sub-word units determined by the model's tokenizer, a vocabulary of roughly 50,000–200,000 common fragments learned during training.

For most English text, one token is roughly four characters or three-quarters of a word. Common words like the, is, and and are each a single token. Longer or rarer words split into multiple tokens: tokenization becomes token + ization.

Different model families use different tokenizers with distinct vocabularies. This is why the same sentence can produce a different token count on GPT-4o (using OpenAI's o200k_base) versus Claude (using Anthropic's tokenizer) versus Gemini. The difference is not rounding — it reflects genuinely different vocabularies, and it matters when you are budgeting API costs.

How are tokens counted? (with examples)

Tokenizers convert raw text into integer IDs from a fixed vocabulary using byte-pair encoding (BPE) or similar algorithms. The count of those IDs is the token count. Here are five practical benchmarks across different text types:

Text	Approximate tokens	Notes
“Hello world”	~4 tokens	Two common words, a space, and punctuation
100-word paragraph	~75 tokens	English prose averages ~0.75 tokens per word
Python function (20 lines)	~120 tokens	Code tokenizes more densely than prose; keywords compress well
JSON object (10 fields)	~50 tokens	Short string values; structural punctuation adds overhead
1,000-word article	~750 tokens	Consistent with the ~0.75 tokens-per-word English average

Why model matters:These figures are typical for OpenAI's tokenizers. Claude's tokenizer produces counts that can differ by 5–15% for the same English text. For non-English scripts, code, or structured data, the divergence can be larger. Always tokenize against the target model for accurate cost estimates.

Why do tokens cost money?

Language models are compute-intensive. Each token in your prompt must be attended to by every layer of the transformer on every forward pass — and generating each output token requires a separate forward pass through the entire model. Running these operations at scale requires thousands of high-end GPUs or TPUs.

API providers therefore charge per million tokens processed, split into two components:

Input (prompt) tokens — the text you send, including any system prompt and conversation history. These are cheaper because they can be processed in a single batched forward pass.
Output (completion) tokens — the text the model generates, one token at a time in an autoregressive loop. Generating tokens is typically 3–10 times more expensive per token than reading them.

Some models also bill separately for thinking tokens (internal reasoning chains). OpenAI's o-series models add thinking tokens on top of their output price; DeepSeek R1 bundles thinking into the output token price. This distinction can double or triple the cost of a reasoning-heavy request if you are not accounting for it.

How many tokens is my text?

The fastest way to find out is to paste your text into the Calculate Tokens calculator. It runs each model's actual tokenizer in your browser using WebAssembly — your text never leaves your device.

If you need a quick mental estimate:

English prose: divide word count by 0.75 (or multiply by 1.33) to get a rough token count.
Characters: divide character count by 4 for a rough heuristic. This is what models without a dedicated tokenizer report (marked with a ~ prefix in the calculator).
Code: expect slightly more tokens per character than prose — indentation, punctuation, and operator symbols each consume tokens.

Remember that the heuristic significantly underestimates token counts for non-Latin scripts (Chinese, Arabic, Hindi) and overestimates for dense code. Only an exact tokenizer call gives you a reliable number for cost estimation.

Token limits by model

Every model has a context window — the maximum number of tokens it can process in a single request, counting both input and output together. Exceeding this limit produces an error.

Model	Provider	Context window	Tokenizer
GPT-5.6 Sol	OpenAI	1.05M	o200k_base (tiktoken)
GPT-5.6 Terra	OpenAI	1.05M	o200k_base (tiktoken)
GPT-5.6 Luna	OpenAI	1.05M	o200k_base (tiktoken)
GPT-5.5	OpenAI	1.05M	o200k_base (tiktoken)
GPT-5.4	OpenAI	1.05M	o200k_base (tiktoken)
GPT-5.4 mini	OpenAI	400K	o200k_base (tiktoken)
GPT-5.4 nano	OpenAI	400K	o200k_base (tiktoken)
GPT-4o	OpenAI	128K	o200k_base (tiktoken)
GPT-4.1	OpenAI	1M	o200k_base (tiktoken)
o4-mini	OpenAI	200K	o200k_base (tiktoken)
Claude Fable 5	Anthropic	1M	claude-new
Claude Opus 5	Anthropic	1M	claude-new
Claude Opus 4.8	Anthropic	1M	claude-new
Claude Sonnet 4.6	Anthropic	1M	Anthropic tokenizer
Claude Sonnet 5	Anthropic	1M	claude-new
Claude Haiku 4.5	Anthropic	200K	Anthropic tokenizer
Gemini 3.6 Flash	Google	1M	Gemini tokenizer
Gemini 3.5 Flash	Google	1M	Gemini tokenizer
Gemini 3.5 Flash-Lite	Google	1M	Gemini tokenizer
Gemini 2.5 Flash	Google	1M	Gemini tokenizer
Gemini 2.5 Flash-Lite	Google	1M	Gemini tokenizer
Gemini 2.5 Pro	Google	1M	Gemini tokenizer
DeepSeek V4 Pro	DeepSeek	1M	SentencePiece (LLaMA)
DeepSeek V4 Flash	DeepSeek	1M	SentencePiece (LLaMA)
DeepSeek V3	DeepSeek	128K	SentencePiece (LLaMA)
DeepSeek R1	DeepSeek	128K	SentencePiece (LLaMA)
Llama 4 Scout	Meta	1M	Heuristic (~chars ÷ 4)

Context windows grow as models are updated. Check the calculator for the latest verified figures.

Token cost calculator

Paste any text into the calculator to see exact token counts and USD costs across all major models simultaneously. Each model runs its own tokenizer in your browser — no server, no data collection, no approximations.

Open the token calculator