Tuesday, January 03, 2006

Hypersound

Once upon a time, I said I was going to have a podcast. It was to be called Knee of the Curve, and feature futurist discussion topics. I marvelled at myself in my buzzword-compliance. Truly, by leveraging a synergistic paradigm, I would be able to enhance client solutions. Or something.

While I was making the first installment of said podcast over Winter Break, I came to a realization: Podcasting isn't really for me. I don't really like the sound of my voice, and I prefer hypertext over spoken word, anyway. Right now, the richness of hypertext is something that we simply cannot find in a medium like audio. Here's what I mean.

I can link to a website, and you can go there. I can embed images. I could even make said images link to said website, but I dare not bombard you with too much at once. We are, after all, only human, and people tend to produce more than audiences wish to consume. (Examples: Mein Kampf, War and Peace, the Bob Dylan discography)

Now, let me ensure you that podcasting does enable you to embed images at certain points (Apple calls them "chapters"), and websites can be included in your RSS/Atom feed. However, we humans don't treat audio like we treat video, so this still isn't as compelling as hypertext. Allow me to elaborate on why.

When was the last time you skimmed a news article? Now, when was the last time you skimmed a popular song? I'll bet you haven't done the latter as often as the former. See, the way we treat text allows us to do things concurrently. You likely viewed the above picture of Dusty the Dog at the same time that you read the accompanying text. But you don't listen to two or three people talking at the same time unless you're listening to the O'Reilly Factor, and that's hardly "information." Or entertainment. Or anything.

So now we know why audio is limited, but is the limitation in our medium or in our bodies? Could we engineer a World Wide Web whose primary medium is audio? I think so.

Imagine turning off your monitor and throwing on your (3D) headphones. Surrounded by a plethora of unique ambient sounds, you move your mouse forward, and hear the sounds move away, only to have new ones in the forefront.

As you get closer to a particular sound, it begins to take shape. Oh, wait, this is my Bob Dylan discography. Let's move away. This sounds like what I want; ah, yes, it's Juno Reactor. *click* Ah, sweet, sweet Juno Reactor.

Hmm, what's the faint click you hear in the upper left? It's the cursor. It's slowly going to the right as the song progresses. And, you can still move around and find other files, but now Juno Reactor will follow you. "God Is God," indeed.

Now, the audiophiles will complain here. "But I want to hear my music unadulterated!" Well, the graphically inclined feel that way about video, and do you know what they do? They make their movie player go "fullscreen." Sure, they can't see their interface anymore, but they know they can hit escape at anytime to get it back. So, we can add a "fullaudio" mode, where your aural interface no longer interferes with what you're listening to; just hit escape to get the interface back. Ah, just like WMP...er, take that back. Just like VLC.

The driving technology needed for this interface is simple; how do you make a complicated sound into something less complex, yet still reminiscent of the original work? R. Luke DuBois recently engineered a song called "Billboard" using a technique called time-lapse phonography, where a pop song is "averaged," resulting in an ambient sound that represents all the sound values of a file, slammed together. This average can be made more and more coarse until you get to the actual sound itself.

Next, you just need audio snippets that can represent the discrete objects with which your interface will be dealing (like, say, songs), and you're set! The spectral averages become GUIDs that you can recognize with your ears. "Fly" to them with your mouse, and begin interacting. It's that simple.

OpenAL, Ogg Vorbis, and time-lapse phonography. Put them together, and you've got a music player that interacts with what a music player should; your ears.

Hey, this is a pretty good idea; anyone want to help me develop it?

No comments: