The story of encoding
Monday, June 11, 2007 by Dave Winer.
Which came first, the platform or the developer?
Before there could be RSS, there had to be XML, a language for expressing data in a way that both computers and humans can read. The great thing about XML is that if the techies are careful, anyone with a little time and intelligence can understand what they're doing.
But XML couldn't have happened until there was a way to encode alphabetic characters, the letters A through Z, numeric characters, 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9, and "special" characters like parentheses, commas, question marks, etc. Encoding is how you take something that a human can read and convert it into something a machine can read, a language called "binary." While some humans can read binary, if they try hard, almost no one wants to read it, because it is so cumbersome and verbose.
There are only two letters in the alphabet of the binary language, 0 and 1. So a number like 27 is expressed in binary as 11011. My name Dave is ridiculously complicated in binary: 01000100 01100001 01110110 01100101. (I did the conversion in my head, so there are probably mistakes. And I added the blanks so if you want to check my work, it won't make you go blind. But the blanks aren't part of the binary language.)
Hopefully, you can see why the smart people who invented "encoding" did so. It's much easier to write "Dave" than all those 1s and 0s! How would you remember them? And would your eyes be able to quickly recognize the string of 1s and 0s as the sounds your mouth makes when you say my name? It was invented to make life easier, and it does.
This encoding stuff was invented before I was born, when information for computers was stored on cards made of the same stuff as file folders, and to record a bit of data, you'd punch a hole where you wanted a 1 and not punch one where you want a 0. Long before there were iPods, disks, thumb drives or even magnetic tape, there were specialized computers used by the government and business that recorded information on mountains of these punched cards.
And of course, there was more than one way to encode the data. So the cards that could be read by National Cash Register's computers couldn't be read on machines made by Burroughs or UNIVAC. The companies sometimes deliberately set it up this way so their customers couldn't switch. Once they had you they didn't want to give you up. (This is called lock-in. Today's computer companies do it too.)
So there were wars about how to encode data, not wars with guns and people dying, but economic wars, with users caught in the middle. The users would prefer to have choice, so they would have more money to spend on other things, or increase their profits, or allow them to do more with the same amount of money. Eventually the wars ended, leaving us with a confusing mishmash of ways to encode bits, it's more complicated than anyone wants it to be, but things work as long as you do them the way we do them in America on IBM-compatible equipment, and of course people in other countries don't like that. That's why sometimes when you display a document that was written on a Mac in Italian on a PC that's used in Korea, you see lots of junk on the screen instead of letters that make sense.