FLACpermalink categories: technology originally posted: 2006-01-12 10:14:11
I have a lot of music. I genuinely don't know how many CDs I own; it's been a long time since I invested the time to count them. I'm sure it's over 2000, and it may be as many as 2500 or even 3000.
Over the years, I've ripped most of it to MP3s. I've used various quality levels over the years, and re-encoded my archive a couple of times. Most recently, starting on New Years's Day 2001, I started over using 256kbps-average VBR "quality zero" joint stereo. These music files sound fine, and I estimate are on average 2x larger than traditional "CD-quality MP3s" (128kbps CBR).
There are two problems with my archive in this state. First, sometimes I want to re-encode something at a lower bit rate—say for putting on a portable player, or perhaps on a web site. Re-encoding an MP3 means decompressing it, then re-compressing it; because the original compression threw away a lot of information, I understand that the resulting re-encoded MP3s are both bigger than they otherwise would be and don't sound quite as good. (OGG Vorbis files are better in this regard; I believe you can reduce the bit rate by simply stripping information away, obviating the need for decoding and recoding.)
Second, even the mild loss of signal one encounters with high-bitrate MP3s introduces deviation from the original signal, aka "noise". My ears are nowhere near sensitive enough to detect the difference, but I imagine it is still there, lurking under the surface, tiring my ears during long listening sessions.
There's also a third point. Nothing in this big wide universe lasts forever, and it's always a good idea to have backups. I'd like to have good backups of my music collection. But, obviously, a lossy MP3 is not a perfect backup of a CD.
So, sometime last year, I was noodling over the idea of re-encoding my archive to an even higher bitrate. Maybe OGG, which would re-encode better. And then I began to dream of something even better: storing the CDs completely losslessly. And that dream began to take hold.
Before I go on, let me talk a little about bit rates and file sizes. A standard CD—what the industry calls "Red Book" format—stores 44,100 16-bit samples per channel per second. Red Book has two channels, Left and Right, for stereo sound. So that means 88,200 16-bit samples per second. Or 176,400 bytes per second, aka 176.4kBps (kilo-bytes per second), or 10.584MB per minute. Compare that with "CD quality" 128kbps MP3; the "b" here means "bits", and there are eight bits per byte, so this is really 16kBps, or 960KB per minute. "CD quality" MP3s are a little under 1/10 the size of the original, or about 90.9% compression. My 256kbps-average VBR MP3s vary in their bitrate, but the goal is an average of 256kbps, which means it's double the size of "CD quality" MP3s; 32kBps, 1920KB per minute, 81.86% compression.
The maximum length of a Red Book audio CD is about 80 minutes, which is about 846.7MB uncompressed. (Yes, this is more than you can store on a CD-ROM; those max out at about 800MB. Audio CDs are encoded differently than data CDs, and achieve more usable space.) Compressed to CD-quality, 80 minutes is only 76.8MB. Using a 250GB hard drive for your MP3 archive already seems like it's more than you'd ever need; you could store over 3000 such albums.
But only a few CDs are 80 minutes. Albums destined made to be released as 12" LPs ("long-playing" vinyl records, officially known as "gramophone records") can't be longer than a half-hour per side, and 15-20 minutes per side is more common. So older albums are generally between 30 and 60 minutes—and some shorter "double albums" can even fit on one CD. And, though new albums could be 80 minutes, they generally aren't; 70 minutes is still considered a long album.
So let's estimate. How much storage would it take to store my entire collection? Let's assume that I have 2500 CDs, and that an average CD is, oh, 50 minutes.
(50 minutes) * (10.584 megabytes / minute) * 2500 = 1,323,000MB1.3TB. That's a lot. But not an unapproachable size, in this era of 500GB hard drives and RAID arrays.
But I have one more ace up my sleeve—lossless compression. MP3s are lossy; when you decompress them you don't get all the bits out. Well, there are plenty of kinds of data compression where you get out exactly what you put in. The most common kind is probably the ZIP file (though GZ files are perhaps more familiar to UNIX hackers). Sadly, simply ZIPping audio data achieves lousy compression; I think it's on the order of 10% compression. 10% compression would get me down to 1.17TB, which is still a lot.
Happily, there are data compressors tailored specifically for audio data. They're designed to work just like MP3s; you can store your data in their format, and play it back with common MP3 players. The only differences are a) they are lossless, and b) the files are much bigger than MP3s.
There are many lossless audio compressors out there, but one stands head-and-shoulders above the rest: FLAC. It's an open-source project, and it's popular, so it's gaining widespread support in many different projects. On average, FLAC gets about 40% compression. That knocks down my estimated archive to about 780GB, an eminently approachable size. It's still 6x larger than "CD-quality" MP3s, but it's only 3x larger than my 256kbps-average VBR MP3s. [Edit: I previously used FLAC's own benchmark estimate of an average of 60% compression. But that's not usually when I see when I watch the compressor go by, and other pages say that in real-world use the average is closer to 40%. So I've updated this to say 40%.]
So I've started re-encoding my archive in FLAC. It's slow going; encoding an album takes about fifteen minutes, even on my new PC. But this time, once I'm done I'll never have to re-encode my CDs ever again.