URL Encoding (Percent-Encoding) Explained
Guide Β· Updated
URL encoding, also called percent-encoding, is the mechanism defined in RFC 3986 for representing characters in a URL that have special meaning or are not allowed to appear directly. Each such character is replaced by a percent sign (%) followed by two hexadecimal digits representing its byte value β for example, a space becomes %20. It exists so that data placed inside a URL is never confused with the URL's own structural syntax.
What percent-encoding actually is
A URL is not free-form text. It has a strict grammar β a scheme, a host, a path, an optional query string after ?, and an optional fragment after #. Each part is separated by specific delimiter characters. The problem is obvious as soon as your data itself contains one of those delimiters: if a search term contains an ampersand, how does the server know it is part of the term rather than the start of the next parameter?
Percent-encoding solves this by escaping. The character is first represented as one or more bytes (using UTF-8 for non-ASCII characters), and each byte is written as a percent sign followed by its two-digit hexadecimal value. A space is byte 0x20, so it encodes to %20. The ampersand is 0x26, so it becomes %26. RFC 3986 recommends uppercase hex digits (%2F, not %2f), though decoders must accept both.
The decoding side simply reverses this: it scans for a % sign, reads the next two hex digits, and substitutes the resulting byte. This round trip is what lets arbitrary data β including spaces, slashes, and non-English characters β travel safely inside a structured URL.
Reserved vs. unreserved characters
RFC 3986 divides characters into two groups that determine whether encoding is required. The unreserved set never needs encoding: the uppercase and lowercase ASCII letters, the digits 0-9, and four punctuation marks β hyphen (-), period (.), underscore (_), and tilde (~). These characters always represent themselves and can appear literally anywhere in a URL.
The reserved set is made of delimiters that carry structural meaning. RFC 3986 splits these into general delimiters (gen-delims): colon, forward slash, question mark, hash, square brackets, and the at sign β and sub-delimiters (sub-delims): the exclamation mark, dollar sign, ampersand, apostrophe, parentheses, asterisk, plus, comma, semicolon, and equals sign.
The rule of thumb: a reserved character may appear unencoded only when you intend it as a delimiter. The moment it is part of your data rather than the URL's structure, it must be percent-encoded. Everything outside both sets β control characters, the space, and all non-ASCII characters β must always be encoded.
Common characters and their encoded forms
The table below lists the characters that trip people up most often, with the percent-encoded form you should produce when these appear inside data such as a query parameter value. Every value is the hexadecimal of the character's ASCII byte.
| Character | Name | ASCII (hex) | Percent-encoded |
|---|---|---|---|
| (space) | Space | 0x20 | %20 |
| & | Ampersand | 0x26 | %26 |
| ? | Question mark | 0x3F | %3F |
| # | Hash / number sign | 0x23 | %23 |
| = | Equals sign | 0x3D | %3D |
| / | Forward slash | 0x2F | %2F |
| + | Plus sign | 0x2B | %2B |
When to encode query parameters
The query string is where encoding matters most, because it is where user-supplied data usually lands. A query is a series of key=value pairs joined by &. To keep that structure unambiguous, you must encode every key and every value individually before assembling them β never encode the whole query at once, or you will destroy the & and = that hold it together.
Concretely: if a value is the phrase "rock & roll", encoding just the value yields rock%20%26%20roll, and the full pair becomes q=rock%20%26%20roll. The single literal & that separates parameters stays intact, while the & inside the data is hidden as %26. In JavaScript this distinction maps to two functions: encodeURIComponent() encodes a single piece of data (it escapes & = ? /), while encodeURI() is meant for an already-complete URL and deliberately leaves delimiters alone.
When you are hand-building a URL or debugging one, encoding each component with a dedicated tool removes the guesswork and prevents the structural characters from leaking into your data.
Common gotchas: space, plus, and the fragment
The single biggest source of confusion is the space. In a path, a space is %20. But in the application/x-www-form-urlencoded format β the encoding used by HTML form submissions and most query strings β a space is historically encoded as a plus sign (+). This means + and %20 can both mean 'space' in a query, which has a knock-on effect: a literal plus sign in your data must be encoded as %2B, or it will be silently read as a space on the other end. If you mean 'C++', send C%2B%2B.
The ampersand is the second classic trap. Because & separates parameters, any ampersand inside a value must become %26; otherwise the receiving server splits your value into two parameters and you lose half the data. The same applies to = inside a value, which should be %3D so it is not mistaken for the key-value separator.
Finally, the hash (#) marks the start of the fragment identifier, and crucially the fragment is never sent to the server β browsers strip everything from # onward before making the request. If a # appears in data you are passing to a server, it must be encoded as %23, or everything after it simply vanishes from the request. Treat #, &, +, and the space as the four characters most likely to break a URL when left unencoded.
When you need to encode or decode a URL component, paste it into the URL encoder. For embedding binary or structured payloads in a URL-safe way, the Base64 encoder is the right tool, and if you are inspecting an API response carried in a URL or request body, the JSON formatter makes the decoded result readable.
Frequently asked questions
Why is a space sometimes %20 and sometimes a plus sign?
In a URL path, a space is always %20. In a query string using the application/x-www-form-urlencoded format (the default for HTML forms), a space is historically encoded as a plus sign (+). Both are valid in their context; the safest universal choice is %20, which is accepted everywhere.
Which characters never need to be encoded?
The unreserved set in RFC 3986 never needs encoding: ASCII letters (A-Z, a-z), digits (0-9), and the four marks hyphen (-), period (.), underscore (_), and tilde (~). Everything else either is a structural delimiter or must be percent-encoded when used as data.
How do I encode a literal plus sign in a URL?
Encode it as %2B. If you leave a literal + in a query string, many servers interpret it as a space, so data like 'C++' must be sent as 'C%2B%2B' to survive decoding intact.
Should I encode the whole URL or each part separately?
Encode each component (each key and value) separately before assembling the URL. Encoding the whole string at once will escape the structural & and = characters and corrupt the URL. In JavaScript, use encodeURIComponent() per component and reserve encodeURI() for a complete URL.
What happens if I put an unencoded # in a query parameter?
The # starts the fragment identifier, and browsers strip everything from # onward before sending the request to the server. So an unencoded # causes the rest of your data to disappear from the request. Encode it as %23 to keep it as literal data.
Try the tools
Sources & references
This guide is general information to help you understand the topic and use the tools β it is not professional (financial, medical, legal, or tax) advice. Verify anything important before relying on it. See our Disclaimer.