The Contract Behind Every Bit
A bit is the smallest portion of information there is.
That's it, that's the whole definition. Everything else is just what you build with it.
Think of it as a slot. The slot can hold one of two states: on or off, electricity or no electricity, 1 or 0. Nothing in between. And that limitation is exactly what makes it powerful!
It's the simplest possible system, which is why it's the one computers are built on.
Here's a way to feel it instead of just reading it.
Say you and your roommate agree on a signal: your room light on means "don't bother me," off means "come say hi." That's a full bit of information.
One light, two states, one meaning — because you both agreed on what on and off stand for.
That agreement is the whole trick. The light itself is meaningless. What makes it work is the shared context.
Now stack a second signal — say, whether your door is open or closed.
Suddenly you can say four different things: light on and door open (working, but drop by), light on and door closed (seriously, don't), light off and door open (free, come say hi), light off and door closed (free, but the room's dark and empty).
Two bits, four messages.
Let's write that down properly. Say 1 means light on, 0 means light off. And 1 means door open, 0 means door closed:
11— light on, door open10— light on, door closed01— light off, door open00— light off, door closed
That's it. Two bits, four states, and now it's not a metaphor anymore — it's literally how the light and the door would show up if a computer had to store them.
And notice: the order matters too. 10 and 01 mean completely different things. Which bit comes first — light or door — is part of the contract as well.
That's the pattern. Each bit you add doesn't just add "a bit more" — it doubles what you can express.
It's how bits stop being boring 0s and 1s and start being able to represent, eventually, text, music, images, anything on your computer.
It's always just a bunch of bits, organized so that someone — or something — knows how to read them.
But the contract only works if everyone's using the same one.
Say your roommate moves out, and your new roommate agrees to the opposite convention: for them, 0 means light on, and 0 on the second bit means door open.
You send 10 — meaning, by your contract, light on and door closed. They read it by their contract: light off, door open.
Same two bits, completely different message — because you never agreed on the standard.
This is also why you sometimes get broken text in an email or a message.
Two sides don't agree on the standard. One side wrote using one encoding, the other side reads with a different one, and the result is garbage where you expected a letter.
Like this: café written in one standard and read in another shows up as café.
If you write in Portuguese, you've probably seen this.
Accents are usually the first casualty when two sides don't agree on the standard.
Bytes: bits in groups
Computers usually don't work with bits one at a time. They work in groups, and the standard group is 8 bits — a byte.
Half a byte, 4 bits, has a name too: a nibble. You'll see it less often, but it's out there.
If you've ever used a u8 in Rust — an unsigned integer from 0 to 255 — that's literally one byte.
You can check this in Rust:
println!("{}", std::mem::size_of::<u8>()); // prints 1 (byte)Now you know why the range stops at 255: that's every combination 8 bits can make.
2⁸ = 256 possible combinations, from 00000000 to 11111111. So it can represent values from 0 to 255.
The bool type is 1 byte too — even though it only technically needs 1 bit to represent true or false:
println!("{}", std::mem::size_of::<bool>()); // prints 1 (byte)ASCII: the first big contract
One of the earliest and most famous of these contracts is ASCII (American Standard Code for Information Interchange).
It's basically a giant lookup table — this byte means this letter, this byte means that symbol. A small taste of it:
| Binary | Decimal | Character |
|---|---|---|
01000001 | 65 | A |
01100001 | 97 | a |
00110000 | 48 | 0 |
00100001 | 33 | ! |
00100000 | 32 | (space) |
Same idea as the light and the door — a number means nothing until both sides agree it stands for something. ASCII is just that agreement, written down and adopted everywhere.
The American alphabet only actually needs 7 bits to cover every letter, symbol, and number, which gives you 128 combinations.
But people settled on 8 bits — one full byte — instead, giving 256 possible values.
That extra room didn't get used the same way twice. Different vendors and regions extended ASCII in their own direction — one code page for French and German accents, another for Cyrillic, another for Greek — each using the 128–255 range differently.
There was no single standard for "extended ASCII," just a pile of incompatible ones, which is part of why the mojibake problem from earlier was so common.
But it still couldn't do everything. Kanji, Indian scripts, Indonesian characters — none of these 8-bit extensions had room for them, capped as they all were at 256 total characters.
Unicode: the bigger contract
That's the gap Unicode was built to close. It's designed to represent every language, plus emojis 🤓, using up to 32 bits, or 4 bytes.
The genius part is that Unicode is backwards-compatible with ASCII. The first block of Unicode is a direct copy of ASCII. So any valid ASCII text is automatically valid Unicode too — nothing gets left behind.
But if you're only writing in English, using the full 32 bits of Unicode for every character is wasteful.
That's why UTF (Unicode Transformation Format) exists as a variable-width encoding — UTF-8, UTF-16, UTF-32.
If your text is mostly English, UTF-8 gets you the same characters using far fewer bits — 8 bits, or 1 byte — per character on average.
Why this is actually useful to know
Once you know a character typically costs 8 bits or 1 byte, you can start doing real math.
Look at a plain text file encoded in ASCII or UTF-8, count the characters, and you can estimate the file size in bits, bytes, or whatever unit you want.
That's the moment bits stop being an abstract "computers are just 0s and 1s" idea and become something you can actually reason about.
Not random noise — bits and bytes sitting in the exact position someone agreed they should sit in, so that whoever reads them next knows exactly what they mean.