Momentary Fascinations: The Spyro Soundtracks, And The Inscrutable XA Format

The Spyro Soundtracks, And The Inscrutable XA Format

permalink categories: technology originally posted: 2006-07-23 07:42:31

Last decade, three fine games came out for the Sony Playstation: Spyro The Dragon, Spyro 2: Ripto's Rage!, and Spyro 3: Year Of The Dragon. They were all well-crafted, with excellent graphics (for the time), a captivating art style, plenty of professional voice acting (including Tom "Spongebob Squarepants" Kenny as Spyro in the second two), and hours of fun—and occasionally challenging—gameplay.

The soundtracks for these three fine games was created by Stewart Copeland. Mr. Copeland was the drummer for The Police; since then he's mostly done soundtrack work, dotted with solo albums (The Rhythmatist, The Equalizer And Other Cliff Hangers, and Orchestralli.) I've got a couple of his soundtracks, because now and then he hauls in his pal Stan Ridgway to sing something, and I am a Stan Ridgway completist.

Well, that Spyro music is the stuff. It strikes the right balance between being interesting and sitting in the background providing color. And, having played the hell out of the games, the music reminds me of the games, often evoking the mood of the areas where that music is played.

Years ago I downloaded purported "soundtracks" for all three games in MP3 format, from some shady site that is probably long gone. But, darn it, I'm building an impeccable library of FLAC-encoded music, and spotty MP3s I got from a disreputable site just won't cut it. I own all three games, and the music must be stored on the games somehow, and I meant to extract it for myself—in the highest quality possible. I've done that—more or less—but I had a hell of a time doing it, as you shall soon see. I'll tell you exactly how I did it, in case you ever want to try it for yourself.

Playstation Resources

The first stop on our tour of pitfalls is the CD the game comes on. Yes, Playstation games are stored on CDs. But they're not just plain ol' CDROMs, nossir. If you think it's as simple as "plop it in your computer and nose around on it for WAV files", are you in for a shock!

First, while they're CDROMs, they aren't what's called ISO 9660 format CDROMs. That's the format most computers use, and it's what you and I think of when we think of CDROMs. But Sony Playstation CDROMs are in "CDROM/XA" format. CDROM/XA specially optimizes for multimedia use; you can easily stream audio data and compressed video data at the same time that you're reading other kinds of data (like textures or programs).

Old CDROM drives couldn't cope at all with this funny format; modern ones barely can. More specifically: some files you can read with with just a modern drive. But audio and video data is stored in special files called "STR" files, which don't follow the conventional rules. If you use Windows to try copying that file from the CD to your computer, you'll get an error. You need special software to read STR files.

Second, the music isn't just in a conventional WAV format; it's in a funny format called Sony ADPCM. ADPCM is like MP3, sort of; it's a lossy compression format. But, while MP3 uses sophisticated data analysis techniques, ADPCM is shockingly simple. It only works on 16-bit sounds, it operates directly on the WAV data, and it yields a strict compression ratio (which I think is always 4:1). Put simply, ADPCM compresses by storing deltas instead of the full samples (that's the D, for Differential), and these deltas can change based on previous values (that's the A, for Adaptive).

Just to add insult to injury, audio data isn't stored using the industry-standard sample rate of 44.1KHz. Instead, Sony specified 37.8KHz. Why? I have no idea.

PsxMC

A week or two ago I downloaded a tool called PSX MultiConverter. I gather it's one of the better tools for dealing with PSX data. But when I originally played with it, it complained it couldn't deal with the disk without a special "ASPI driver". I dimly remembered this whole ASPI "thing" being a "pain", so I gave up.

Then, a couple of days ago, by sheer chance I stumbled onto this page. It's a blog entry from 2002 about how this guy extracted the soundtrack straight out of his Spyro The Dragon CD. There's a wiki-like comments area, and just the previous day someone said all one needs to get PsxMC working is a file called "WNASPI32.DLL", and it's easy to find. I searched my drive, and hey! I already had one; it came with Nero. I copied it into the PsxMC directory and bam! it started working.

When you use PsxMC, it scans the drive looking for all STR files, so it can parse them and find all the audio and video data. You can then "Convert" that data, which means extracting it from the drive, decompressing it from ADPCM into WAV files (and something similar for video files), and saving it to your disk (although with the original sample rate). It works like a treat—when it works.

I extracted the audio for all three games. And then I noticed that the results didn't all come out the same. PsxMC worked great for Spyro 1; after it was done I had all the songs from the game, each in its own individual WAV files, ready to go. But the other two games came out different. First, I got songs and speech tracks. But they didn't get their own files; there were four or five songs in each WAV file. And worst of all—each WAV file was chopped off short!

For example, PsxMC showed one of the speech files as being 23 seconds long. But when it extracted it, the resulting WAV file was only 9 seconds long! Some of the song WAV files had only half of the last song, some chopped off the last song entirely! Eek! That won't do!

I tried every tool I could find for dealing with STR files: PSMPlay, the CDXAReader plugin, PSXtuls, XACOPY, XAEX, PSound, and CDXA. (You can see all of them on this page.) Nothing worked by itself. And none of the tools were under active development; the world has moved on from the Sony Playstation. But In the end I pulled it off: I got all the extractable data out, music and speech. But it required a concert of several tools, a whole lot of knowledge about computers and programming and sound formats, and some good-old-fashioned elbow grease.

Channels

First here's some more I learned about the CDROM-XA format. STR files contain numbered "channels" for audio data. I suppose a Playstation programmer can say "Play channel 0", and the computer will do the rest.

Most people labor under the misconception that there are eight channels of audio in a STR file. Nearly all the extraction tools stop there, only examining channels 0 through 7. But I found out there can actually be thirty-two channels, because two tools could deal with the other twenty-four. Why do most extraction tools stop after the first 25%? It's some kind of historical thing having to do with the XA file format. Suffice to say, Spyro 2 and Spyro 3 both store 32 channels of audio. And both have some songs in channels after the first eight. Which means most tools can't get you some of the songs. (More on number of channels in a moment!)

Also, a channel can contain multiple songs. Since ADPCM gives you strict, regular 25% compression, you can easily jump into the middle of a "channel" and start playing. For Spyro 1, Insomniac used multiple STR files to store all the music (called PETEXA0.STR through PETEXA5.STR... I wonder why). For the other two, they used one STR file for everything (called SPEECH.STR, probably because most of it is the speech audio) and just concatenated the songs into the first couple of channels.

Finally, a channel can have multiple formats. Here's where it gets really strange. The Spyro games store their music in the usual 37.8KHz sample rate, in stereo, and 16-bit as ADPCM requires. This renders into nice high fidelity audio data for the music. But human voices don't need a very high bit rate to sound perfectly clear, and they don't need to be stereo. So the speech is stored at a 18.9KHz sample rate, and mono to boot. (Still 16-bit.) That means speech compresses 4x better than music. But, and like I said here's where it gets weird, the second two Spyro games store music and speech in the first few channels; four or five songs, followed by a few minutes of speech.

PsxMC was the only tool to really understand what was going on. It could somehow see the format changes, and produced separate WAV files for the different formats: the 37.8KHz stereo data went in one file, and the 18.9KHz mono data went into another. All the other tools produced one WAV file with all the data, which meant you got lots of pleasant music followed by some funny-sounding sped-up chipmunk voices at the end.

If PsxMC had simply extracted all the data, instead of truncating it, I'd have been done—and estatic. But no!

CDXA

In desparation, I poked around with Google. Eventually I found something I hadn't seen before: CDXA. It's an open-source Linux package for dealing with audio data on Playstation discs. It's not actually one tool; it's a whole suite of tools, or perhaps an "odd grab-bag" of tools, including old command-line tools and a whizzy "new" graphical tool (untouched since 2002). The command-line tools are deprecated, but I couldn't compile the graphical tool (I don't have the X header files and libraries), so I made do with what I had. CDXA contains a command-line tool called xa2wav, which the documentation mentions is a hacked-up version of a DOS program I already had. This tool takes a "channel" argument, and in playing with it I discovered it would accept channels up to 31. Hooray!

Unfortunately, for reasons I don't understand, it thinks that channel audio uses a 38KHz sample rate, instead of a 37.8KHz sample rate. It's actually hard-coded in the file. Booooo! The author knew the right value; I don't know why he used the wrong one here. I changed it to use the right value.

Worse than that, the WAV files it generates don't work. The WAV header is a little messed up. If you edit the file and put the right one in, the data is otherwise fine, but that's kind of a pain, huh?

Chipmunk Banishment

But xa2wav was otherwise very limited. Like all the other tools except PsxMC, it didn't understand the switch in audio format. So it produced the same music-followed-by-chipmunks like all the other tools did. And, having had the thought of extracting the speech data tantalizingly dangled in front of me, I wanted to see it through.

My first try was to directly change the sample rate on the WAV files. I use an old tool called Cool Edit (now Adobe Audition) to edit audio files. Cool Edit lets you set sample rate directly, no resampling necessary. I did that, but the voices still sounded wrong. They were still talking too fast, and there was some weird distortion to boot.

Next, I remembered that the speech-only samples extracted by PsxMC were mono. Cool Edit won't let you "just" switch something from stereo to mono, as it does for sample rate; if you ask it to change from stereo to mono, it'll do a lot of complex work, including resampling the audio, and I didn't want all that. So I hacked up the WAV header using a hex editor and brute-force changed it from 37.8KHz stereo to 18.9KHz mono, then reopened it in Cool Edit. Now it was at the right speed, but there was really heavy distortion.

I kept chipping away at that approach, thinking that some magical configuration of the WAV file would let me hear the audio correctly. And then it hit me: this would never work. The audio was itself corrupted, by the ADPCM decoding step. It had been decoded assuming it was stereo, which meant the decoder viewed the audio as two sets of deltas and maintained two separate running values.

Here's where I lucked out. I quickly hacked up the source to the Linux xa2wav, hard-coding the format so that all channels were 18.9KHz mono. After a little further hacking, to ensure that the headers said that too, it worked like a treat.

And now more on channel count. While I was writing this article, I realized that PsxMC gave me a lot more than just 32 speech files. Why was that? CDXA came with some documentation on the XA file format, and I noticed that "channel" is stored as a full byte. So the format allows you to specify as many as 256 different channels. It was simple to change xa2wav so it'd allow any number of channels. And, boom! I now had fifty channels of speech data pour out of Spyro 2.

And since I was already hacking on it, I saw where the author had hard-coded 38KHz, and changed it in the original "clean" copy to 38.7KHz.

And since I was already recompiling it, I noticed that the source was nice generic portable code. I copied it over to my Windows box, and bam! it compiled and ran correctly first try.

superxa2wav

And then I thought, why do this by half-measures. I charged into the source, cleaning it up some, and fixing it so that it detects changes in format and writes out a new WAV file for each one. And fixed the headers for all output WAV files so they loaded correctly. By now it was bearing less and less resemblance to the original, so I retitled it superxa2wav. I feel like I did a nice job: this single tool now does what no other tool could. You can run it against the Spyro STR files and produce exactly what you want as output.

You can download superxa2wav here. That zip file contains the source, the project files for Microsoft Visual Studio 6, a pre-built executable for Windows, and a short readme.

Channel maps

When you extract the WAV files from the STR file, they have funny automatically-generated names. this The original poster who extracted the Spyro 1 music published a list of "what you should rename the files to". I've taken his list, touched it up a little based on feedback he got on his page, and published the result. I also made equivalent lists for Spyro 2 and Spyro 3, basing the names on the equivalent filenames from those MP3 soundtracks I downloaded a long time ago.

I called these listings "channel maps", because they tell you how to map the various channels to the final product. I also listed my preferred song order in each "channel map", in case you want to listen to the soundtrack as one big album.

You can view/download them below:

Spyro The Dragon channel map
Spyro 2: Ripto's Rage! channel map
Spyro 3: Year Of The Dragon channel map

FLAC Files

One side note before we get to the real meat. Since I don't want to go through all this rigamarole again, I decided I'd keep not only the final result but also most of the intermediary files. FLAC files are better than WAV files for this; they're smaller, and I can store some meta-data in them. You'll see me convert to FLAC early on in the conversion process listed below—now you know why.

I also wanted to store the end results as FLAC; that was the whole idea. But that has a downside: FLAC doesn't really like these funny sample rates. Officially it supports any rate 1Hz to 655350Hz in 1Hz increments. But that doesn't necessarily mean it likes doing it. FLAC has a specific list of sample rates it likes. (Here they all are, as expressed in KHz: 8, 11.025, 12, 16, 22.05, 24, 32, 44.1, and 48.) Since 37.8KHz and 18.9KHz are not on the list, they are second-class citizens.

In what way? First, FLAC defines only those anointed sample rates as "streamable", meaning that it knows how to pick up in the middle of a stream. "Picking up in the middle" also applies to seeking in a stream—for example, jumping ahead to the middle of a song. Since 37.8KHz and 18.9KHz are not "streamable", you can't jump ahead in the middle of a song.

Also, I like using ReplayGain with my music. But FLAC only supports ReplayGain for sample rates in its little list. (I dunno why.)

For pure listening enjoyment, I wanted the final soundtrack files to use the standard 44.1KHz sample rate. That means heavy-duty audio processing, which can take a long time. But surely the final result is worth it...!

Extraction

So, after a lot of experimentation and process refinement, here's my final process. Here's what I did, from beginning to end, to produce my high-quality Spyro soundtracks.

To extract a .STR file with PSMPlay:

When PSMPlay starts up, it asks you to tell it what CD drives map to what internal devices ("Configure CD-ROM Drive"). If you only have one CD-ROM drive in your system, this makes it easy, but I think you still need to tell it. The important part is that the drive you're going to use is correct.
Press the little upwards-pointing arrow on the main window, the "Eject button". That will open the "Open" dialog.
Navigate the "Open" dialog to the drive where your Playstation game is, and go inside. You should find the .STR files in the root of the CD.
Double-click one of the files to "open" it in PSMPlay.
In the "playlist" window, make sure the .STR file you want to play is selected, and current; it should have a gray background behind it, instead of the usual black background of the playlist. The main window should light up with audio information, as if it was ready to play. (Which it is.)
Right-click in the main PSMPlay window, and just under the middle of the list is "Save File". Save the .STR file to your hard disk.

For Spyro 2 and Spyro 3:

Extract SPEECH.STR straight onto your hard disk using PSMPlay.
Run superxa2wav to create all the WAV files: superxa2wav speech.str spyro2 for Spyro 2, and superxa2wav speech.str spyro3 for Spyro 3.
Look at the sizes of the files it produced. The big ones (bigger than 10MB) contain music; the small ones contain speech.
- Spyro 2 music files will be spyro2.00.00.wav through spyro2.08.00.wav; Spyro 2 speech files will be spyro2.00.01.wav through spyro2.08.01.wav, and spyro2.09.00.wav onwards.
- Spyro 3 music files will be spyro.00.00.wav through spyro3.13.00.wav; Spyro 2 speech files will be spyro3.00.01.wav through spyro3.13.01.wav, and spyro3.14.00.wav onwards.
Convert these files en masse to FLAC. The Windows GUI tool doesn't like the nonstandard bitrate, so I used the command-line tool. My command line was flac.exe -f -8 —lax —output-prefix=flac/ * to encode; -f means "always overwrite", -8 means maximum compression, —lax allows me to use the non-standard sample rates, and —output-prefix meant it neatly sorted the FLAC files into a subdirectory called flac.
If you want to save your intermediate work, here's a good place. I called these the "original" files.
Open the music FLAC files directly in your favorite WAV file editor—as you might have guessed, mine's Cool Edit, but they don't sell that anymore. GoldWave is still available, and should work fine too. (And yes, you can automatically open FLAC files directly in Cool Edit, using the good FLAC plugin). Once it's open, highlight each of the songs in turn. It's easy to see where a song ends, as they always drop in volume at the end; you can see the waveform pinching in towards the center. (You can also look up the approximate times for each song in my "channel maps".) Always grab with some slop on either side, as you're gonna clean 'em up later. Use Save selection to save these to their own individual FLAC files. (I had to figure out the names of the songs myself; happily, you can use my notes.) Move them to their own directory.
Open each of the individual song FLAC files in your WAV editor. For each file, trim the slop on both ends (so you have just that song) and save it out again. If I recall correctly, there is always some absolute silence (all zeroes) between songs; I recommend you trim the songs so there's very little (or no) silence.
Here's another good place to save your intermediate work. I called these the "split" files.
If your media player is happy playing 37.8kHz FLAC files (like FooBar2000 is), you can skip the next two steps.
Set up a Cool Edit "batch processing" script that uses Convert Sample Rate to convert the "current" file to 44.1KHz stereo 16-bit. Just to be safe, I cranked up the "quality" slider, to as high as it would go. Run this on all the "split" files, telling it to save the result to another directory, and to not pause at dialogs. This will take a long time; about a minute per 10MB on my Athlon 64 x2 4400+. I called these the "final" files. I'm sure you can do this in GoldWave, but I never use it so I don't know how. (If you email me instructions on how, I'll put 'em up on this web page.)
Set up another Cool Edit "batch processing" script, same as the above, but 44.1KHz mono. Run this on all the speech files.
Use your FLAC player to tag all the files with their proper metadata. Come up with some rational order for the songs (I like "in the order you'd most likely hear them in in the game") and save off a playlist.

For Spyro 1:

Extract the PETEXA0.STR through PETEXA5.STR files straight onto your hard disk using PSMPlay.
Run superxa2wav on each one to extract the WAV files: superxa2wav petexa0.str spyro1.petexa0 for the first one, superxa2wav petexa1.str spyro1.petexa1 for the second one, and so on.
Convert these files en masse to FLAC, following step #4 above. The resulting files are pre-split, so I called these the "split" files.
Run the "batch processing" script from step #9 above (the script for stereo files) on all the files. I called these the "final" files.
Use your FLAC player to tag all the files with their proper metadata. Come up with some rational order for the songs (I like "in the order you'd most likely hear them in in the game") and save off a playlist.

And that's it! Phew!

Notes And Corrections

Here's where (for the sake of pure accuracy and edification) I fix the errors and omissions I deliberately committed (for the sake of simplicity) in the above text.

Before you charge off to buy a Spyro game, know this: the original three games were made by Insomniac Games. After the first three, they moved on to create the Ratchet & Clank series for the PS2, which are also uniformly excellent. But Universal, who owns the Spyro "license", wanted to make more games. With Insomniac uninterested in continuing, they've handed it off to various other teams, to make new games on various platforms. The resulting games have been average at best, and absolute crap at worst. (Yes, I'm looking at you, Spyro: Enter The Dragonfly.) So keep in mind: you don't want to play Spyro games so much as you want to play Insomniac games.
Spyro 3 came out near the end of 2000. Technically that is part of last decade—just so you know.
The actual correct names for the latter two games are Spyro (2): Ripto's Rage! and Spyro: Year Of The Dragon. But those parentheses are ugly, and the omission of the ordinal number is irksome.
The soundtrack for Spyro 3: Year Of The Dragon was actually credited to Stewart Copeland, Ryan Beveridge, and Kenneth Burgomaster. Let me point something out: the music for Lost Fleet is just the music for Sheila's Alp with some irritating reedy organ thrown on top. My theory is that Mr. Copeland wrote the vast majority of the original material, and those two other guys tweaked it or something. Certainly it all sounds very Copeland-y. But I don't know for a fact what is going on here.
For some Playstation games, it is terrifically easy to extract the music directly. Playstation games can use STR file resources, or they can just store conventional ("Redbook") CD audio. (Or presumably both.) The Wipeout series of games just uses normal CD audio, and can be played in any reasonably modern audio CD player, or ripped to MP3s or FLACs with a modern drive.
I said I extracted "all the extractable data". To be specific, some of the data simply wasn't extractable. If the game uses the STR facility, or stores its music as CD audio, you can get at it. But if they invent their own scheme for doing things... well, obviously you could still get at the data. I mean, it's right there on the CD, it's not beamed in from space. But you'd have to reverse-engineer their file formats and guess at their data formats. I figure that's a non-starter. This means that I can't get at the speech for Spyro 1, because they stored it somewhere besides the STR file. And I know of at least one song that I didn't find in the STR data of Spyro 2: the music played during the end credits. Where is that stuff stored? What format is it in? I don't expect I'll ever know.

If you're curious, the source tarball for CDXA contains a file called XA subheader information.txt which contains a short discussion on channel numbering. Here's the relevant section, so now you'll know as much as I do:

Byte 1, Channel Number
======================
0-15            ADPCM
0-31            Data/Video
ADPCM data can only be up to 16 channels, this would be reduced to 8 if the Play
Station didn't have double speed CD-ROM access

One advantage of storing a version of the 37.8KHz "split" songs is that you can recompress directly from this into OGG Vorbis files. The OGG Vorbis encoder doesn't really care what the original sample rate is, because it's going to change it all around anyway. And in terms of signal loss and distortion it's better to use the cleanest, least-changed signal you can. So you'll get infinitesimally-better OGG files if you encode from the 37.8KHz files than from the final 44.1KHz files.
So who's this Pete that PETEXA0.STR etc are named after?

Stewart Copeland's Release

Edit: 2006/08/10

I just discovered this: in January of this year, Stewart Copeland put "the soundtrack to Spyro 1" up on his web site, what he seemingly calls "Spyro Rank One". You can find it here, playable in a Flash music player:

http://www.stewartcopeland.net/multimedia/multimedia.htm#

He has thirteen tracks, listed under what are apparently his original names. All are in the game; twelve of them are extractable from the Spyro CD. The remaining track is track 7, "Orbit". It's been a bit since I played Spyro, but I'm pretty sure that's the opening screen music, the music you hear as you load your save game from the "main menu". And yes, as I just noticed myself, it's not extractable (or at least not yet).

Here's the list of tracks, in his order and with his names; I've put the levels they map to after each.

Breather (Tree Tops)
Rain (Confronting Jacques)
Frog (Wizard Peak)
Squid (Confronting Metalhead)
Stoat (Dream Weavers World)
Grant (Gnork Cove)
Orbit (Main Menu)
Louis (Confronting Blowhard)
Scary_Flyer (Ice Cavern)
Five (Dark Hollow)
Potato (Lofty Castle)
Tiger (Icy Flight)
Avacado (Dark Passage)

And yes, that's how he spelled "avocado".