UTF-8 Byte Inspector

See how text becomes UTF-8 bytes, character by character.

in-browser

How to use

1 Type or paste any text into the box.
2 Read the code point, byte count and bytes for each character in the table.
3 Check the totals for code points and UTF-8 bytes.
4 Copy the full UTF-8 hex string if you need it elsewhere.

About UTF-8 Byte Inspector

The UTF-8 Byte Inspector shows you exactly how a piece of text is stored as bytes.

Type or paste anything — plain ASCII, accented letters, currency symbols, CJK characters or emoji — and the tool breaks it down code point by code point.

For each character you see its Unicode code point (in U+ notation and decimal), how many UTF-8 bytes it takes, and those bytes written out three ways: lowercase hex, eight-bit binary and decimal.

A running summary shows the total number of code points and the total byte length, and the full byte stream is offered as one copy-ready hex string.

This is invaluable when you are debugging encoding problems.

UTF-8 is a variable-length encoding: ASCII characters take one byte, most Latin-script accents and common symbols take two, the bulk of the Basic Multilingual Plane takes three, and emoji and other astral characters take four.

The inspector walks the string by true code points, so an emoji that is stored as a UTF-16 surrogate pair still appears as a single row with its real four-byte UTF-8 encoding, helping you understand the difference between characters, code points and bytes.

It is handy for working out why a string is longer in bytes than in characters, for verifying byte-length limits in databases and protocols, for spotting hidden or unexpected characters, and for learning how Unicode and UTF-8 fit together.

Everything runs locally in your browser, so even sensitive text is never uploaded, and the tool keeps working offline once loaded.

FAQ

Why does my emoji show as one character but four bytes?

The inspector counts true Unicode code points. An emoji is one code point even though JavaScript stores it as a surrogate pair, and in UTF-8 that single code point encodes to four bytes.

Why is the byte count larger than the character count?

UTF-8 is variable-length. ASCII is one byte per character, but accented letters, symbols and CJK or emoji characters take two, three or four bytes each, so byte length exceeds character length.

Is my text sent anywhere?

No. All analysis happens in your browser using the built-in TextEncoder. Nothing is uploaded, logged or stored, so it is safe for private or sensitive text.