Base64 Encoder / Decoder · 5 min read
Why Base64 Makes Data 33% Bigger — and When That Matters
Base64 encoding inflates data by roughly 33%, plus padding and line breaks. Here is exactly why the overhead exists, where it bites, and when it is worth paying.
Base64 is a tax. It takes binary data and makes it 33% larger so that it can survive systems that only handle text. Most of the time the cost is invisible — a few extra bytes in a JSON payload, a slightly longer data URL. Sometimes it's the difference between a page that loads instantly and one that doesn't. Knowing exactly where the 33% comes from makes it easier to decide when to pay it.
The Math
Base64 (RFC 4648) maps every 3 bytes of input to 4 characters of output. Each output character represents 6 bits, drawn from a 64-character alphabet (A-Z, a-z, 0-9, +, /). Three bytes is 24 bits; four 6-bit characters is also 24 bits. The math works.
Output size is therefore 4 × ceil(N / 3) for an N-byte input. The ratio is exactly 4/3 ≈ 1.333. A 1 MB file becomes 1.333 MB before any padding, line wrapping, or HTTP overhead.
Padding and Line Breaks
When the input length isn't divisible by 3, Base64 pads with = characters so the output is always a multiple of 4. That adds 1 or 2 bytes — negligible for big payloads, but enough to break naive parsers that expect pure alphanumerics.
Classic MIME Base64 (RFC 2045) also wraps output at 76 characters per line, inserting a CRLF every 76 bytes. That's another ~2.6% overhead on top of the 33%. Modern uses (data URLs, JWTs, JSON) skip line wrapping, but PEM-encoded certificates and email attachments still carry it.
Where the 33% Actually Hurts
Three places where the overhead is worth thinking about.
1. Inlining images in HTML/CSS
Data URLs (data:image/png;base64,...) save an HTTP round trip. They also bloat the HTML. A 50 KB image becomes ~67 KB inline, and that 67 KB is now in your HTML — which means it can't be cached separately, can't be lazy-loaded, and blocks the parser longer. For tiny images (icons, sub-1 KB sprites) the trade is fine. For anything larger, a separate request usually wins.
2. JSON APIs returning binary data
JSON has no binary type, so binary fields get Base64-encoded. A REST API returning a 5 MB PDF in a JSON wrapper is moving 6.7 MB over the wire — and the receiver has to parse the entire JSON before it can decode the Base64. Streaming binary responses (or using a separate endpoint with application/octet-stream) is almost always faster.
3. JWTs and bearer tokens in headers
JWTs are three Base64-encoded segments. A token that carries 500 bytes of claims becomes ~670 bytes after encoding, plus header structure. Sent on every request, that's a measurable bandwidth cost — and many proxies cap header size at 8 KB total. Token bloat is the leading cause of mysterious "400 Request Header Fields Too Large" errors.
What Base64 Buys You
The overhead exists for a reason. Pre-Unicode email transports were 7-bit clean — anything outside ASCII was at risk of being mangled by intermediate servers. Base64 was designed to survive that. The same robustness applies today to:
- JSON and XML: binary in a text-only format.
- URLs and HTTP headers: only a subset of bytes are legal.
- Source code: embedding a binary blob as a constant.
- QR codes and barcodes: charset constraints from the encoder.
In all of these, the alternative isn't "send raw bytes" — it's "build a separate channel for binary," which is usually more expensive than 33%.
When to Skip It
If both endpoints can handle binary, skip Base64. multipart/form-data uploads send raw bytes. WebSocket binary frames send raw bytes. Content-Type: application/octet-streamon an HTTP response sends raw bytes. gRPC's protobuf wire format is binary end-to-end. Any of these moves a 1 MB file as 1 MB, not 1.33 MB.
Compression Doesn't Save You
A common myth: "the gzip on my web server compresses Base64 back down, so the overhead doesn't matter." It does — partially. gzip can recover some of the 33% because the Base64 alphabet has lower entropy than raw binary. But Base64 of already-compressed data (a JPEG, a ZIP, a TLS-encrypted blob) compresses very poorly, because the underlying entropy is already maxed out. You pay close to the full 33% on exactly the kinds of payloads where it matters most.
Base64 is the right tool when text is the only path. It's the wrong tool when there's a binary-safe alternative one config flag away. The 33% is a real cost — small enough to ignore for tokens and tiny assets, big enough to matter for media payloads at scale.
References
- Josefsson, S. (2006). RFC 4648: The Base16, Base32, and Base64 Data Encodings. Internet Engineering Task Force.
- Freed, N. & Borenstein, N. (1996). RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. Internet Engineering Task Force.
- Google Developers. (2024). Web Fundamentals — Reduce data sent to clients. Google.
- Ristic, I. (2021). Bulletproof TLS and PKI, 2nd Edition. Feisty Duck.