WebAL - Interactive audio for browsers

From Wikiid
Revision as of 10:28, 11 January 2011 by SteveBaker (Talk | contribs) (Why OpenAL and not something else?)

Jump to: navigation, search

This document is a proposal to the standards-makers and browser manufacturers to produce a "WebAL" audio subsystem within web browsers - analogous to the WebGL graphics API standard.

At time of writing, there is really only one mechanism supported within web browsers for producing sounds: The HTML5 <audio> tag. Failing that, one must resort to plugins - the most likely being Flash.

What's wrong with using the HTML5 <audio> tag/API?

If all you need to do is to replay a piece of music, streamed from your server - then nothing. The HTML5 audio markup and corresponding JavaScript API appear strongly oriented to streaming large audio files over the Internet, a task for which it is reasonably well suited.

However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless. Pushing the limits of what <audio> can do exposes numerous fatal flaws:

  1. The markup/API described in the HTML5 specifications is loosely described. There is much that goes unsaid (eg How many sounds can you play at once?).
  2. Not one of the mainstream browsers actually implements 100% of what the specification describes.
  3. Within the subset of the standard that these browsers do claim to support - there are many bugs which have gone un-fixed for far too long (over a year in many cases).
  4. There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions. After two months of strenuous efforts and several seemingly-promising contacts, I have yet to have a single communication with anyone who knows anything about the standard or its implementation.
  5. Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for aggressively interactive applications such as games.

Together, this speaks of an unloved corner of HTML5 that has been implemented to the minimal extent needed to stream single music tracks - with zero ongoing support.

If there are to be games on the Internet without Flash - we need to take drastic action.

What do games and simulations need?

At the barest minimum, an interactive application will typically need the following features:

  • The ability to have some kind of background or "ambient" sound track (eg Music, chirping crickets, the sound of the ocean) looping without a break.
    • Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it. But that process takes time - during which there will be an unacceptable break in the audio.
    • Chrome honors the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track.
    • An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
  • The ability to trigger a short sound with almost no latency. The HTML5 audio system has a complex set of commands and events that you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option (to completely preload the entire sound into memory and keep it there) is completely missing!
    • Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!".
    • Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
  • The ability to reliably play some number of sounds simultaneously.
    • The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds.
    • Both Firefox and Chrome seem to be able to play multiple sounds - but it's sproradic and ill-specified.
    • Games can typically manage the number of sounds they are playing, prioritizing the important ones (eg, a game character telling you what your mission is, the loudest sounds, the nearest sounds, etc) - but they need know how much the underlying player can manage in order to exercise that control so that (for example) an ambient cricket chirp doesn't mask the sound of your own gun firing.

Additionally, a high quality application would greatly benefit from:

  • The ability to dynamically control the frequency of replay and volume of sounds that are already playing in order to simulate (for example) doppler shift and range attenuation of a moving sound source.
  • The ability to dynamically control the reverb/echo of sounds that are already playing in order to adjust the audio to the virtual space in which it's being played.
  • The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
  • MIDIfile support (a much more bandwidth-efficient way to support game-music than waveform-based systems such as Ogg and MP3 - and one that also permits dynamically created music that can shift to suite the game state by fading individual tracks in and out and by switching instruments).

What is proposed here

Clearly we need something entirely new to the browser world. The <audio> tag would be exceedingly difficult to repair and enhance to the degree that is required here. Developing an entirely new audio specification piecemeal would be a difficult and contentious task - with no guarantee that the result would be either implementable or sufficiently useful.

What is needed is the adoption of an existing, widely-accepted, IP-free sound API specification - preferably one with a cross-platform OpenSource implementation and existing hardware support. Taking this approach would avoid the need for the parties involved to make a million tiny decisions. They would hopefully be able to agree that the existing API is what we want - and the rest should fall into place relatively easily.

Development should be along the lines of the WebGL development. We start with our existing API - have the the Khronos group "own" the standard and encourage various other interested parties (Mozilla, Apple, Google, etc) pick it up and make test implementations. Iterate on the (hopefully) small details of what has to change in order to make it appropriate for some of the special conditions present in a browser environment.

The standardization process would take this existing standard, "wrap" it with JavaScript bindings and tweak it for the web's needs for security, networkability and to avoid some of the issues relating to JavaScript itself. That model has proven extremely successful for WebGL - and we should emulate it here.

I call this hypothetical API "WebAL" (Web-based Audio Library) and I propose that it be based on the existing and widely used OpenAL library which fulfills all of the criteria we desire in such an API.

Why OpenAL and not something else?

There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace (just as OpenGL-ES is for graphics). However OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications. OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation.

A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library). This standard has been around since 1998 and has been stable, in it's present version since 2005. It is widely implemented and used in hundreds of commercial games and simulations across PC's, Mac's and game consoles. OpenAL is probably the number one choice for audio in cross-platform applications - and it implements all of the important "Wish List" items. The specification and mailing lists are supported by CreativeLabs - but they have no IP in OpenAL (although they do have their own proprietary extensions).

There has been discussion over the years of creating a formal "OpenAL ARB" to manage the OpenAL spec - but so far, that has come to nothing.

The "OpenAL-Soft" implementation is OpenSourced under LGPL and runs on Windows, Linux, Mac, BSD Unix, PS3 and Xbox-360. It is claimed that it would be easily portable on top of an OpenSL-ES implementation, so a cellphone version of OpenAL-Soft would be an easy development. There are also hardware-accelerated implementations of OpenAL for several high-end sound cards under Windows and a native MacOS X version that's supported by Apple.

The OpenAL 1.1 specification is moderately well formalized, it reads like the OpenGL 1.2 "RedBook". There is also a programmers guide. However, it is not as formal as (say) the OpenGL or OpenSL specifications. It would be nice to have Khronos group experts turn it into a more formal spec.

Because we know that OpenAL is so widely used, we also know that it is feature-complete. OpenSL has probably never been used in a commercial game - and it would be a much harder sell to get game developers to use it. OpenAL is also a higher level specification than OpenSL. It is based around the idea that sound sources and 'listeners' are placed and moved in 3D space, and the OpenAL library handles the resulting spatialization, doppler, range attenuation, etc.

What work is involved?

  1. Take the OpenAL specification, and apply Khronos Group's level of formality to it - without changing much of what it is or how it works.
  2. Provide JavaScript bindings for all of the OpenAL API (eg: alGetError() becomes al.getError(), AL_INVALID_NAME becomes al.INVALID_NAME). Because OpenAL is so closely modelled on OpenGL, this would be carried out entirely analogously to the WebGL bindings - it would be an extremely simple undertaking with almost zero discussion needed.
  3. Provide interfaces from the existing "typed array" mechanism in WebGL to enable blocks of raw audio to be efficiently accessed or created via JavaScript. This would be entirely analogous to the way textures are handled in WebGL.
  4. Provide loaders for at least Ogg/Vorbis and .wav audio formats. Hopefully, much of the code from the <audio> subsystem could be re-used here. Decisions previously made about MP3 format would be carried through from <audio> (ie not everyone will support it). Streaming would not be supported in WebAL - that's what <audio> is for.
  5. Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
  6. For PC-based applications, have the browser either find a "native" OpenAL driver (such as Creative sound cards support) - or use the "Soft OpenAL" library.
  7. For embedded systems such as cellphones, implement OpenAL drivers on top of OpenSL-ES.
  8. Build example programs and a test suite as we go.
  9. Do the initial phases quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.
  10. Consider how to port <audio> on top of WebAL.

In short, repeat - as closely as possible - the work done on WebGL.

What about the <audio> stuff?

The existing audio tag works reasonably well for playing long sounds - such as music tracks - that benefit from streaming. Unlike WebGL, which is conceptually a layer atop the existing <canvas> system, the situation with <audio> would be reversed. Browser writers would be well-advised to rework their audio tag support or layer it on top of the foundations provided by OpenAL/SL. The existing audio features should be considered a specialization of WebAL. This would permit things like the placement of streaming audio sources onto moving objects to position them into the stereo/surround-sound field.

Who does this? Who pays for it? When will it happen?

I have no idea.

I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.

We need this soon. If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind. Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order. OpenAL was specifically designed to be as similar to OpenGL as possible. Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL. Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms. Issues of where we stand vis-a-vis extensions are already understood.

The OpenAL specification

Version 1.1 is the latest:

 http://connect.creativelabs.com/openal/Documentation/Forms/AllItems.aspx 

There are additional places where extensions to the spec reside.

Implementations of OpenAL

The main implementations are in MacOS X, OpenAL-Soft (OpenSourced - any platform), and the CreativeLabs driver for Windows.

OpenAL-Soft may be obtained here:

  http://kcat.strangesoft.net/openal.html

The closed-source CreativeLabs driver for Windows is here:

  http://connect.creativelabs.com/openal/Downloads/Forms/AllItems.aspx

The MacOS X driver is at the same URL.

Conclusion

I hope everyone who reads this can understand the need - and find this proposal as compelling as I do.

Interested Parties

  • Myself: steve@sjbaker.org - Professional graphics engineer working in games and simulation for 25 years. Also was an early proponent of OpenAL. Wrote the "ALUT" companion library.
  • The OpenAL developer list: openal-devel@opensource.creative.com
  • The WebGL developer list: public_webgl@khronos.org