URL Encoder / Decoder · 5 min read

URL Encoding Explained — What %20 and %3A Actually Mean

Percent-encoding replaces unsafe URL characters with % followed by two hex digits. Learn which characters need encoding, the difference between encodeURI and encodeURIComponent, and how URLs are parsed.

The Structure of a URL

A URL has a defined structure, formalized in RFC 3986: scheme, authority, path, query, and fragment. For example:

https://example.com/search?q=hello+world&lang=en#results

Each component uses different characters as delimiters: / separates path segments, ? starts the query string, & separates query parameters, = separates parameter names from values, and # starts the fragment. These are reserved characters — they have structural meaning in a URL.

Why Percent-Encoding Exists

If the data you want to include in a URL happens to contain a reserved character — for example, a search query that includes & or = — those characters would be misinterpreted as URL structure, not data.

Percent-encoding (also called URL encoding) solves this by replacing each byte that needs encoding with a % followed by two uppercase hexadecimal digits representing that byte's value. A space becomes %20, a colon becomes %3A, and an ampersand becomes %26.

The mechanism is simple: look up the ASCII (or UTF-8) byte value of the character, convert it to hexadecimal, and prepend a percent sign.

Unreserved vs Reserved Characters

RFC 3986 defines two categories:

Unreserved characters — safe anywhere in a URL, never need encoding: A–Z a–z 0–9 - _ . ~
Reserved characters — have structural meaning, must be encoded when used as data: : / ? # [ ] @ ! $ & ' ( ) * + , ; =

All other characters — including spaces, Unicode characters, and control characters — must also be percent-encoded.

The Space Problem: %20 vs +

Spaces cause a common point of confusion. In URL paths, a space should always be encoded as %20. In query strings submitted by HTML forms (using application/x-www-form-urlencoded format, defined in RFC 1866), spaces are encoded as +. This is a separate, older convention that predates RFC 3986.

The result: ?q=hello+world and ?q=hello%20world are both valid ways to represent "hello world" in a query string — but the + convention only applies to form-encoded query strings, not to URL paths or fragment identifiers. Many web frameworks handle both, but treating + as a space in URL paths is technically incorrect.

encodeURI vs encodeURIComponent in JavaScript

JavaScript provides two functions for URL encoding, and choosing the wrong one is a frequent bug:

encodeURI() encodes a complete URL. It preserves all reserved characters (/ ? # & = : @ etc.) because they are part of the URL structure. Use this when you have a full URL and want to ensure it is safely transmittable.
encodeURIComponent() encodes a single URL component — a query parameter value, a path segment, a fragment. It encodes all reserved characters, including /, ?, &, and =. Use this when encoding data that will be inserted into a URL component.

The classic mistake: using encodeURI() on a query parameter value. If the value contains & or =, they will not be encoded and will break the query string parsing.

International Characters and Punycode

Domain names traditionally allowed only ASCII characters. Internationalized Domain Names (IDN) — domains with accents, Arabic, Chinese, or other scripts — are handled by converting them to ASCII using Punycode. The domain münchen.de becomes xn--mnchen-3ya.de in the DNS system.

In URL paths and query strings, non-ASCII characters are handled differently: encode them as UTF-8 bytes, then percent-encode each byte. The string "café" becomes caf%C3%A9 (since the é character is 0xC3 0xA9 in UTF-8).

Double-Encoding Bugs

A common bug is encoding a URL twice. If hello world is encoded to hello%20world, and that string is then encoded again, %20 becomes %2520 (the % is encoded as %25). The server receives %2520 and decodes it to the literal string %20, not a space. Always encode data exactly once at the point where it is inserted into a URL.

References

Berners-Lee, T., Fielding, R., & Masinter, L. (2005). RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force.
Connolly, D., & Masinter, L. (2000). RFC 2854 / RFC 1866: The text/html Media Type — form URL encoding and + for space in query strings.
WHATWG. (2024). URL Living Standard. Web Hypertext Application Technology Working Group.
W3C. (2014). URL — W3C Working Draft. World Wide Web Consortium.