URL Encoder / Decoder · 5 min read
URL Encoding Explained — What %20 and %3A Actually Mean
Percent-encoding replaces unsafe URL characters with % followed by two hex digits. Learn which characters need encoding, the difference between encodeURI and encodeURIComponent, and how URLs are parsed.
The Structure of a URL
A URL has a defined structure, formalized in RFC 3986: scheme, authority, path, query, and fragment. For example:
https://example.com/search?q=hello+world&lang=en#results
Each component uses different characters as delimiters: / separates path segments, ? starts the query string, & separates query parameters, = separates parameter names from values, and # starts the fragment. These are reserved characters — they have structural meaning in a URL.
Why Percent-Encoding Exists
If the data you want to include in a URL happens to contain a reserved character — for example, a search query that includes & or = — those characters would be misinterpreted as URL structure, not data.
Percent-encoding (also called URL encoding) solves this by replacing each byte that needs encoding with a % followed by two uppercase hexadecimal digits representing that byte's value. A space becomes %20, a colon becomes %3A, and an ampersand becomes %26.
The mechanism is simple: look up the ASCII (or UTF-8) byte value of the character, convert it to hexadecimal, and prepend a percent sign.
Unreserved vs Reserved Characters
RFC 3986 defines two categories:
- Unreserved characters — safe anywhere in a URL, never need encoding:
A–Z a–z 0–9 - _ . ~ - Reserved characters — have structural meaning, must be encoded when used as data:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
All other characters — including spaces, Unicode characters, and control characters — must also be percent-encoded.
The Space Problem: %20 vs +
Spaces cause a common point of confusion. In URL paths, a space should always be encoded as %20. In query strings submitted by HTML forms (using application/x-www-form-urlencoded format, defined in RFC 1866), spaces are encoded as +. This is a separate, older convention that predates RFC 3986.
The result: ?q=hello+world and ?q=hello%20world are both valid ways to represent "hello world" in a query string — but the + convention only applies to form-encoded query strings, not to URL paths or fragment identifiers. Many web frameworks handle both, but treating + as a space in URL paths is technically incorrect.
encodeURI vs encodeURIComponent in JavaScript
JavaScript provides two functions for URL encoding, and choosing the wrong one is a frequent bug:
encodeURI()encodes a complete URL. It preserves all reserved characters (/ ? # & = : @etc.) because they are part of the URL structure. Use this when you have a full URL and want to ensure it is safely transmittable.encodeURIComponent()encodes a single URL component — a query parameter value, a path segment, a fragment. It encodes all reserved characters, including/,?,&, and=. Use this when encoding data that will be inserted into a URL component.
The classic mistake: using encodeURI() on a query parameter value. If the value contains & or =, they will not be encoded and will break the query string parsing.
International Characters and Punycode
Domain names traditionally allowed only ASCII characters. Internationalized Domain Names (IDN) — domains with accents, Arabic, Chinese, or other scripts — are handled by converting them to ASCII using Punycode. The domain münchen.de becomes xn--mnchen-3ya.de in the DNS system.
In URL paths and query strings, non-ASCII characters are handled differently: encode them as UTF-8 bytes, then percent-encode each byte. The string "café" becomes caf%C3%A9 (since the é character is 0xC3 0xA9 in UTF-8).
Double-Encoding Bugs
A common bug is encoding a URL twice. If hello world is encoded to hello%20world, and that string is then encoded again, %20 becomes %2520 (the % is encoded as %25). The server receives %2520 and decodes it to the literal string %20, not a space. Always encode data exactly once at the point where it is inserted into a URL.
References
- Berners-Lee, T., Fielding, R., & Masinter, L. (2005). RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force.
- Connolly, D., & Masinter, L. (2000). RFC 2854 / RFC 1866: The text/html Media Type — form URL encoding and + for space in query strings.
- WHATWG. (2024). URL Living Standard. Web Hypertext Application Technology Working Group.
- W3C. (2014). URL — W3C Working Draft. World Wide Web Consortium.