Understanding LLM Tokens: Why Characters Aren't Enough
May 11, 2026 4 min readBy TokenCalc Editor
Many developers make the mistake of estimating costs based on character counts. However, LLMs "see" text as tokens, not characters. This distinction is vital for budget planning.
What is a Token?
A token is a chunk of text. For English, 1,000 tokens is approximately 750 words. However, this varies wildly depending on the model's tokenizer.
OpenAI vs. Others
OpenAI's cl100k_base tokenizer is efficient for Western languages. Other models use different vocabularies. For code or non-English languages, the token-to-character ratio can fluctuate significantly.