WebAL - Interactive audio for browsers
This document is a proposal to the standards-makers and browser manufacturers to produce a "WebAL" audio subsystem within web browsers - analogous to the WebGL graphics API standard.
At time of writing, there is really only one mechanism supported within web browsers for producing sounds: The HTML5 <audio> tag. Failing that, one must resort to plugins - the most likely being Flash.
- 1 What's wrong with using the HTML5 <audio> tag/API?
- 2 What do games and simulations need?
- 3 What is proposed here
- 4 Why OpenAL and not something else?
- 5 What work is involved?
- 6 What about the <audio> stuff?
- 7 Who does this? Who pays for it? When will it happen?
- 8 The OpenAL specification
- 9 Implementations of OpenAL
- 10 Conclusion
- 11 Interested Parties
What's wrong with using the HTML5 <audio> tag/API?
However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless. Pushing the limits of what <audio> can do exposes numerous fatal flaws:
- The markup/API described in the HTML5 specifications is loosely described. There is much that goes unsaid (eg How many sounds can you play at once?).
- Not one of the mainstream browsers actually implements 100% of what the specification describes.
- Within the subset of the standard that these browsers do claim to support - there are many bugs which have gone un-fixed for far too long (over a year in many cases).
- There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions. After two months of strenuous efforts and several seemingly-promising contacts, I have yet to have a single communication with anyone who knows anything about the standard or its implementation.
- Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for aggressively interactive applications such as games.
Together, this speaks of an unloved corner of HTML5 that has been implemented to the minimal extent needed to stream single music tracks - with zero ongoing support.
If there are to be games on the Internet without Flash - we need to take drastic action.
What do games and simulations need?
At the barest minimum, an interactive application will typically need the following features:
- The ability to have some kind of background or "ambient" sound track (eg Music, chirping crickets, the sound of the ocean) looping without a break.
- Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it. But that process takes time - during which there will be an unacceptable break in the audio.
- Chrome honors the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track.
- An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
- The ability to trigger a short sound with almost no latency. The HTML5 audio system has a complex set of commands and events that you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option (to completely preload the entire sound into memory and keep it there) is completely missing!
- Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!".
- Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
- The ability to reliably play some number of sounds simultaneously.
- The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds.
- Both Firefox and Chrome seem to be able to play multiple sounds - but it's sproradic and ill-specified.
- Games can typically manage the number of sounds they are playing, prioritizing the important ones (eg, a game character telling you what your mission is, the loudest sounds, the nearest sounds, etc) - but they need know how much the underlying player can manage in order to exercise that control so that (for example) an ambient cricket chirp doesn't mask the sound of your own gun firing.
Additionally, a high quality application would greatly benefit from:
- The ability to dynamically control the frequency of replay and volume of sounds that are already playing in order to simulate (for example) doppler shift and range attenuation of a moving sound source.
- The ability to dynamically control the reverb/echo of sounds that are already playing in order to adjust the audio to the virtual space in which it's being played.
- The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
- MIDIfile support (a much more bandwidth-efficient way to support game-music than waveform-based systems such as Ogg and MP3 - and one that also permits dynamically created music that can shift to suite the game state by fading individual tracks in and out and by switching instruments).
What is proposed here
Clearly we need something entirely new to the browser world. The <audio> tag would be exceedingly difficult to repair and enhance to the degree that is required here. Developing an entirely new audio specification piecemeal would be a difficult and contentious task - with no guarantee that the result would be either implementable or sufficiently useful.
What is needed is the adoption of an existing, widely-accepted, IP-free sound API specification - preferably one with a cross-platform OpenSource implementation and existing hardware support. Taking this approach would avoid the need for the parties involved to make a million tiny decisions. They would hopefully be able to agree that the existing API is what we want - and the rest should fall into place relatively easily.
Development should be along the lines of the WebGL development. We start with our existing API - have the the Khronos group "own" the standard and encourage various other interested parties (Mozilla, Apple, Google, etc) pick it up and make test implementations. Iterate on the (hopefully) small details of what has to change in order to make it appropriate for some of the special conditions present in a browser environment.
I call this hypothetical API "WebAL" (Web-based Audio Library) and I propose that it be based on the existing and widely used OpenAL library which fulfills all of the criteria we desire in such an API.
Why OpenAL and not something else?
There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace (just as OpenGL-ES is for graphics). However OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications. OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation.
A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library). This standard has been around for a very long time and is widely implemented and used in hundreds of commercial games and simulations across PC's and game consoles. OpenAL is probably the number one choice for audio in cross-platform applications - and it implements all of the important "Wish List" items.
The "Soft OpenAL" implementation already runs on Windows, Linux, Mac, BSD Unix, PS3 and Xbox-360. It is claimed that it would be easily portable on top of an OpenSL-ES implementation, so a cellphone version of Soft OpenAL would be an easy development. There are also hardware-accelerated implementations of OpenAL for several high-end sound cards under Windows and a native MacOS X version that's supported by Apple.
In discussion with the OpenAL people, it seems that the OpenAL 1.1 specification is moderately well formalized - although not as well as (say) OpenGL or OpenSL. It would be nice to have Khronos group experts turn it into a more formal spec.
Because we know that OpenAL is so widely used, we also know that it is feature-complete. OpenSL has probably never been used in a commercial game - and it would be a much harder sell to get game developers to use it. OpenAL is also a higher level specification than OpenSL. It is based around the idea that sound sources and 'listeners' are placed and moved in 3D space, and the OpenAL library handles the resulting spatialization, doppler, range attenuation, etc.
What work is involved?
- Take the OpenAL specification, and apply Khronos Group's level of formality to it - without changing much of what it is or how it works.
- Provide loaders for at least Ogg/Vorbis and "Raw" audio formats.
- Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
- For PC-based applications, have the browser either find a "native" OpenAL driver (such as Creative sound cards support) - or use the "Soft OpenAL" library.
- For embedded systems such as cellphones, layer OpenAL on top of OpenSL-ES.
- Build example programs and a test suite as we go.
- Do the initial phases quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.
In short, repeat - as closely as possible - the work done on WebGL to produce a "WebAL".
What about the <audio> stuff?
The existing audio tag works reasonably well for playing long sounds - such as music tracks - that benefit from streaming. Unlike WebGL, which is built as a layer atop the existing <canvas> system, the situation with <audio> would be reversed. Browser writers would be well-advised to rework the half-finished/broken audio tag support - and instead build it on top of the foundations provided by OpenAL/SL. The existing audio features should be considered a specialization of WebAL. This would permit things like the placement of streaming audio sources on moving objects - or placing them out in the stereo/surround-sound field.
Who does this? Who pays for it? When will it happen?
I have no idea.
I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.
We need this soon. If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind. Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order. OpenAL was specifically designed to be as similar to OpenGL as possible. Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL. Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms. Issues of where we stand vis-a-vis extensions are already understood.
The OpenAL specification
Version 1.1 is the latest:
There are additional places where extensions to the spec reside.
Implementations of OpenAL
The main implementations are in MacOS X, OpenAL-Soft (OpenSourced - any platform), and the Creative driver for Windows.
I hope everyone who reads this can understand the need - and find this proposal as compelling as I do.
- Myself: email@example.com - Professional graphics engineer working in games and simulation for 25 years. Also was an early proponent of OpenAL. Wrote the "ALUT" companion library.
- The OpenAL developer list: firstname.lastname@example.org
- The WebGL developer list: email@example.com