URL Encoding (Percent-Encoding) Explained

Guide · Updated 2026-06-08

URL encoding, also called percent-encoding, is the mechanism defined in RFC 3986 for representing characters in a URL that have special meaning or are not allowed to appear directly. Each such character is replaced by a percent sign (%) followed by two hexadecimal digits representing its byte value — for example, a space becomes %20. It exists so that data placed inside a URL is never confused with the URL's own structural syntax.

What percent-encoding actually is

A URL is not free-form text. It has a strict grammar — a scheme, a host, a path, an optional query string after ?, and an optional fragment after #. Each part is separated by specific delimiter characters. The problem is obvious as soon as your data itself contains one of those delimiters: if a search term contains an ampersand, how does the server know it is part of the term rather than the start of the next parameter?

Percent-encoding solves this by escaping. The character is first represented as one or more bytes (using UTF-8 for non-ASCII characters), and each byte is written as a percent sign followed by its two-digit hexadecimal value. A space is byte 0x20, so it encodes to %20. The ampersand is 0x26, so it becomes %26. RFC 3986 recommends uppercase hex digits (%2F, not %2f), though decoders must accept both.

The decoding side simply reverses this: it scans for a % sign, reads the next two hex digits, and substitutes the resulting byte. This round trip is what lets arbitrary data — including spaces, slashes, and non-English characters — travel safely inside a structured URL.

Reserved vs. unreserved characters

RFC 3986 divides characters into two groups that determine whether encoding is required. The unreserved set never needs encoding: the uppercase and lowercase ASCII letters, the digits 0-9, and four punctuation marks — hyphen (-), period (.), underscore (_), and tilde (~). These characters always represent themselves and can appear literally anywhere in a URL.

The reserved set is made of delimiters that carry structural meaning. RFC 3986 splits these into general delimiters (gen-delims): colon, forward slash, question mark, hash, square brackets, and the at sign — and sub-delimiters (sub-delims): the exclamation mark, dollar sign, ampersand, apostrophe, parentheses, asterisk, plus, comma, semicolon, and equals sign.

The rule of thumb: a reserved character may appear unencoded only when you intend it as a delimiter. The moment it is part of your data rather than the URL's structure, it must be percent-encoded. Everything outside both sets — control characters, the space, and all non-ASCII characters — must always be encoded.

Common characters and their encoded forms

The table below lists the characters that trip people up most often, with the percent-encoded form you should produce when these appear inside data such as a query parameter value. Every value is the hexadecimal of the character's ASCII byte.

Character	Name	ASCII (hex)	Percent-encoded
(space)	Space	0x20	%20
&	Ampersand	0x26	%26
?	Question mark	0x3F	%3F
#	Hash / number sign	0x23	%23
=	Equals sign	0x3D	%3D
/	Forward slash	0x2F	%2F
+	Plus sign	0x2B	%2B

When to encode query parameters

The query string is where encoding matters most, because it is where user-supplied data usually lands. A query is a series of key=value pairs joined by &. To keep that structure unambiguous, you must encode every key and every value individually before assembling them — never encode the whole query at once, or you will destroy the & and = that hold it together.

Concretely: if a value is the phrase "rock & roll", encoding just the value yields rock%20%26%20roll, and the full pair becomes q=rock%20%26%20roll. The single literal & that separates parameters stays intact, while the & inside the data is hidden as %26. In JavaScript this distinction maps to two functions: encodeURIComponent() encodes a single piece of data (it escapes & = ? /), while encodeURI() is meant for an already-complete URL and deliberately leaves delimiters alone.

When you are hand-building a URL or debugging one, encoding each component with a dedicated tool removes the guesswork and prevents the structural characters from leaking into your data.

Common gotchas: space, plus, and the fragment

The single biggest source of confusion is the space. In a path, a space is %20. But in the application/x-www-form-urlencoded format — the encoding used by HTML form submissions and most query strings — a space is historically encoded as a plus sign (+). This means + and %20 can both mean 'space' in a query, which has a knock-on effect: a literal plus sign in your data must be encoded as %2B, or it will be silently read as a space on the other end. If you mean 'C++', send C%2B%2B.

The ampersand is the second classic trap. Because & separates parameters, any ampersand inside a value must become %26; otherwise the receiving server splits your value into two parameters and you lose half the data. The same applies to = inside a value, which should be %3D so it is not mistaken for the key-value separator.

Finally, the hash (#) marks the start of the fragment identifier, and crucially the fragment is never sent to the server — browsers strip everything from # onward before making the request. If a # appears in data you are passing to a server, it must be encoded as %23, or everything after it simply vanishes from the request. Treat #, &, +, and the space as the four characters most likely to break a URL when left unencoded.

When you need to encode or decode a URL component, paste it into the URL encoder. For embedding binary or structured payloads in a URL-safe way, the Base64 encoder is the right tool, and if you are inspecting an API response carried in a URL or request body, the JSON formatter makes the decoded result readable.

Frequently asked questions

Why is a space sometimes %20 and sometimes a plus sign?

In a URL path, a space is always %20. In a query string using the application/x-www-form-urlencoded format (the default for HTML forms), a space is historically encoded as a plus sign (+). Both are valid in their context; the safest universal choice is %20, which is accepted everywhere.

Which characters never need to be encoded?

The unreserved set in RFC 3986 never needs encoding: ASCII letters (A-Z, a-z), digits (0-9), and the four marks hyphen (-), period (.), underscore (_), and tilde (~). Everything else either is a structural delimiter or must be percent-encoded when used as data.

How do I encode a literal plus sign in a URL?

Encode it as %2B. If you leave a literal + in a query string, many servers interpret it as a space, so data like 'C++' must be sent as 'C%2B%2B' to survive decoding intact.

Should I encode the whole URL or each part separately?

Encode each component (each key and value) separately before assembling the URL. Encoding the whole string at once will escape the structural & and = characters and corrupt the URL. In JavaScript, use encodeURIComponent() per component and reserve encodeURI() for a complete URL.

What happens if I put an unencoded # in a query parameter?

The # starts the fragment identifier, and browsers strip everything from # onward before sending the request to the server. So an unencoded # causes the rest of your data to disappear from the request. Encode it as %23 to keep it as literal data.

Try the tools

Sources & references

This guide is general information to help you understand the topic and use the tools — it is not professional (financial, medical, legal, or tax) advice. Verify anything important before relying on it. See our Disclaimer.