URL Encoder / Decoder ยท 6 min read
Double-Encoding: The Bug That Breaks Redirects, OAuth, and Search
Double URL-encoding is the most common URL bug in production. Here is why it happens, how to spot it in logs, and the rule for fixing it across redirects and OAuth.
Every backend engineer eventually ships a double-encoded URL. The symptoms vary โ an OAuth callback that redirects to a 404, a search query that returns nothing, a tracking link that resolves to a literal %20in the path โ but the root cause is always the same: somebody encoded a value that was already encoded. Here's why it happens, why it's so easy to miss, and the discipline that prevents it.
What Double-Encoding Looks Like
A space encoded once is %20. A space encoded twice is %2520 โ because the % in %20 got encoded to %25, and now you've got %25 followed by 20.
hello world // raw hello%20world // encoded once (correct) hello%2520world // encoded twice (broken) hello%252520world // encoded three times (very broken)
Once you've seen %2520in a log line, you spot it everywhere. It's the unique fingerprint of a string that traveled through one too many encoders.
Where It Sneaks In
Double-encoding doesn't happen because someone called the encode function twice. It happens because the encoded form is mistaken for raw input by the next layer.
Redirect URLs
A login flow stores the "return to" URL in a query parameter:
https://app.example/login?return=https%3A%2F%2Fapp.example%2Fsearch%3Fq%3Dpizza
After login, the server reads return, gets the decoded URL, and issues a 302. So far so good. The bug appears when the server re-encodes the value before redirecting โ or when the server passes the still-encoded value to a templating system that does its own encoding pass. The user lands at https://app.example/search%3Fq%3Dpizza with a literal %3F in the path.
OAuth state and code parameters
OAuth 2.0 (RFC 6749) sends authorization codes back via a query parameter. Codes routinely contain =, +, and / โ all reserved characters. When the IdP returns:
?code=abc%2Fdef%3D%3D&state=xyz
The application is supposed to URL-decode the parameter once. If the framework does that automatically and the application code does it again, the code becomes abc/def==at the right step and then nonsense at the next. Token exchange fails with a misleading "invalid_grant" โ the actual problem is character mangling, not credential validity.
Search queries through CDNs
A query string traverses a CDN, an edge worker, a load balancer, and a backend. Each hop has its own URL parser. If any one of them decodes-then-re-encodes incorrectly, the query reaches the search engine with %2520in place of spaces. The search returns zero results because nothing matches the literal string "hello%20world."
The Rule
One layer encodes. One layer decodes. Symmetry.
Concretely: the producer of a URL encodes its components exactly once. Every consumer along the way is responsible for not re-encoding when forwarding. If a value needs to be embedded in a new URL, decode it back to raw form first, then encode once for the new context. Never encode an already-encoded string.
Spotting the Bug in Logs
%25followed by hex digits โ the canonical sign of double-encoding.- Literal
%in the visible URL in a browser address bar after the page loads. Browsers display decoded URLs; a visible%means it survived two decode passes. - 404s on URLs that look correct when you read them, but the path component contains
%2For%3Finstead of the structural/or?. - Empty search results for queries that should match โ particularly when the query contains a space,
+, or non-ASCII character.
The Security Angle
OWASP catalogues double-encoding as an attack technique, not just a bug. A WAF that filters ../ won't catch %252e%252e%252f if the application decodes twice. The same applies to <script> filters in XSS-prone parameters and SQL injection signatures. Anywhere user input passes through a security filter and then a decoder, the filter has to operate on the fully-decoded form โ or an attacker can hide the payload in extra encoding layers.
Defence: decode-then-validate, never validate-then-decode. And reject inputs whose decoded form differs from the raw form (or contains %after one decode pass) for fields that shouldn't contain encoded data.
The Pragmatic Fix
When debugging, walk the URL through every hop and check encoding state at each one. Most frameworks have one specific call site where the bug lives โ usually the place that builds a redirect URL by string-concatenating an already-encoded parameter. Replace the concatenation with a proper URL construction API (URL + URLSearchParams in browsers and Node, urllib.parse.urlencode in Python, net/url in Go). These APIs encode raw values for you, exactly once.
Double-encoding is a discipline problem, not a hard problem. The mental model is "values are either raw or encoded; pick a side at every boundary." Once that's internal, %2520 in a log file becomes a one-second diagnosis instead of a one-day investigation.
References
- Berners-Lee, T., Fielding, R., & Masinter, L. (2005). RFC 3986: Uniform Resource Identifier (URI) โ Generic Syntax. Internet Engineering Task Force.
- Hardt, D. (2012). RFC 6749: The OAuth 2.0 Authorization Framework. Internet Engineering Task Force.
- OWASP. (2023). Double Encoding โ OWASP Web Security Testing Guide. Open Worldwide Application Security Project.
- WHATWG. (2024). URL Living Standard. Web Hypertext Application Technology Working Group.