February 22nd, 2007


(no subject)

So, I was just reading Wikipedia's page on binary size prefixes and now I'm cranky.

Ok, I'm always cranky, but here's the thing. Since 1999, some platinum-bar-waving, blue-helmet-wearing, standards organization has decided to introduce a whole new confusing (and dorky-sounding) set of binary prefixes.

You're familiar with kilobytes and megabytes and probably gigabytes. At least to know that they're the units of storage, of memory, of data. And a meg of memory is more than a k, and a gig is more than a meg. Sure, everybody knows that.

How much more? Is a megabyte 1000 times bigger than a kilobyte? Is a gigabyte 1,000,000 times bigger than a kilobyte? Or is it 1,048,576 times bigger? It depends. It might also be 1,024,000, if you really want it to.

I agree, this is a mess, and if you grew up with a McMetric ruler (I think I had somewhere between 8 and 16 of these when I was a kid), I can see that it'd be nice if kilo- would mean 1000x when it comes to bytes, just like it does when it comes to meters and grams. But for almost 40 years, a kilobyte has meant 1024 bytes, which is 2 raised to the power of 10.

Powers of two are the basis for computer engineering - it all comes down to the number of bits available to describe a number in binary. If I write a computer program for you and tell you that it can only handle 1024 records, that's because somewhere, I'm using 10 binary digits to store a number, and 10 binary digits gets you the numbers 1-1024. If you decide you really need 1100 records, I'll have to find room for at least 11 binary digits, but when I do, you'll get 2048 records, which ought to keep you happy for a while. Chances are, I'd give you 16 digits, which is 65536 records, but just because I like you so much.

But 65536 doesn't roll off the tongue. How much memory does my Commodore have? Well, it's got 65536 bytes of memory, sir! Nah, it'd be handy to at once have a concise, but accurate, way of talking about things that for engineering reasons come in powers of two. So we group computer data into kilobytes, slightly abusing the SI prefix, because 1024 isn't exactly 1000, but you know what we mean. So that fancy computer that dad brought home from K-Mart actually holds 64 kilobytes. Hey, let's call it the Commodore 64! That rhymes, even. Ok, we're done.

Er. I guess not. When floppy disk manufacturers wanted to label their 1474560-byte double-sided, high density, 3 1/2" floppy diskette, they blundered. Their previous disk size was the double-sided, (low density), 3 1/2" disk, 737280-byte disk, which was labeled as 720 kilobytes. The high density disk, though, got labeled "1.44 megabytes". Woo - snazzy! Isn't it exciting that we can start using a whole new unit? Slow down there, junior. The prefix "mega" has been a concise, accurate way to to describe groups of 1024*1024 bytes long before your floppy disk marketing campaign. And, I guess metric-speaking astronomers who want to talk about one million meters might have used mega as a concise, accurate way to describe groups of 1000*1000 meters. But nobody before your high density disk used "mega" to describe 1024*1000 of anything. Oh, well. Now everybody that talks about floppy disks (and who does anymore?) refers to them as 1.44 megabyte floppies. One battle lost. Good thing floppies are gone, right?

Oh, wait, now the hard drive manufacturers are in on it, too. When you buy a 80 gigabyte drive, it's probably 80*1000*1000*1000, not 80*1024*1024*1024 bytes. Oh, and though the CD-ROM manufacturers agree that a 700 megabyte CD holds 700*1024*1024 bytes, a DVD-ROM apparently holds 4.7*1000*1000*1000 bytes.

So, yeah, it's a mess. For a long time, and in a lot of contexts, it's been clear what kilobyte, megabyte, gigabyte, and so on have meant. But the International Bureau of Weights and Measures (BIPM) got sick and tired of the inconsistency and reclaimed the SI prefixes, leaving replacements, the kibibyte, mebibyte, the gibibyte and so on. These have been around for what, 8 years now, and I don't predict them catching on in circles that would have a reason to use them. A compelling reason is that they sound dorky - "kih-bee-byte, meh-bee-byte, gih-bee-byte". Meh, indeed. Even more compelling is that this is reappropriating useful terms to describe something less useful, a little like redefining a dozen to mean 10, just because it makes counting eggs easier. That'd be an even more compelling example if chickens laid 12 eggs at a time. The bits and bytes just come out of the computer in powers of two, and that might be awkward, but it's not going to change. I expect that Ford would put a seven-cylinder engine into a truck before we abandon binary numbering for computer design.

So, if you haven't fallen asleep by this point, I call on you to take back the units! Reject efforts to advance the exbibyte! Cast off the 1024*1000 travesty! Buy 16 eggs at the supermarket! One Zero One!!111!