Difference between revisions of "WebAL - Interactive audio for browsers"

From Wikiid
Jump to: navigation, search
(Conclusion)
Line 3: Line 3:
 
At time of writing, there is really only one mechanism supported within web browsers for producing sounds:  The HTML5 <audio> tag.  Failing that, one must resort to plugins - most likely being Flash.
 
At time of writing, there is really only one mechanism supported within web browsers for producing sounds:  The HTML5 <audio> tag.  Failing that, one must resort to plugins - most likely being Flash.
  
The HTML5 audio markup - and especially the JavaScript API - appear strongly oriented to streaming large audio files over the Internet, a task for which it is reasonably well suited.
+
== What's wrong with using the HTML5 <audio> tag/API? ==
 +
If all you need to do is to replay a piece of music, streamed from your server - then nothing.  The HTML5 audio markup - and especially the JavaScript API - appear strongly oriented to streaming large audio files over the Internet, a task for which it is reasonably well suited.
  
However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless.
+
However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless - and pushing the limits of what it can do exposes numerous fatal flaws:
  
# The markup/API described in the HTML5 specifications is very loosely described.
+
# The markup/API described in the HTML5 specifications is very loosely described there is far too much that goes unsaid.
# None of the mainstream browsers actually implement what the specification describes.
+
# Not one of the mainstream browsers actually implements 100% of what the specification describes.
# Even within the subset that these browsers claim to support - there are many bugs which have gone un-fixed for over a year.
+
# Within the subset that these browsers claim to support - there are many bugs which have gone un-fixed for far too long.
 
# There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions.
 
# There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions.
# Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for agressively interactive applications such as games.
+
# Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for aggressively interactive applications such as games.
  
== What do these applications need? ==
+
Together, this speaks of an unloved corner of HTML5 that has been implemented to the minimal extent needed to stream single music tracks - with zero ongoing support.
 +
 
 +
== What do games and simulations need? ==
 
At the barest minimum, an interactive application will typically need the following features:
 
At the barest minimum, an interactive application will typically need the following features:
  
# The ability to have some kind of background or "ambient" sound track (eg Music, chirping cricket, the sound of the ocean) looping without a break.  Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it.  But that process takes time - during which there will be an unacceptable break in the audio.  Chrome does honor the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track.  An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
+
* The ability to have some kind of background or "ambient" sound track (eg Music, chirping cricket, the sound of the ocean) looping without a break.  Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it.  But that process takes time - during which there will be an unacceptable break in the audio.  Chrome does honor the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track.  An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
# The ability to trigger a short sound with almost no latency.  The HTML5 audio system has a complex set of commands you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option...to completely preload the sound into memory and keep it there...is missing!  Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!".  Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
+
 
# The ability to reliably play some number of sounds simultaneously.  The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds.  Browsers do seem to be able to play multiple sounds - but it's sproradic and unspecified.  Games can manage the number of sounds they are playing - but they need control over that in order that (for example) an ambient cricket chirp doesn't mask the sound of your gun going off when you pull that trigger.
+
* The ability to trigger a short sound with almost no latency.  The HTML5 audio system has a complex set of commands you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option...to completely preload the sound into memory and keep it there...is missing!  Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!".  Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
# The ability to control the frequency of replay and volume of sounds dynamically in order to simulate (for example) doppler shift.
+
 
# The ability to control reverb/echo of sounds in order to adjust the audio to the virtual space in which it's being played.
+
* The ability to reliably play some number of sounds simultaneously.  The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds.  Browsers do seem to be able to play multiple sounds - but it's sproradic and unspecified.  Games can manage the number of sounds they are playing - but they need control over that in order that (for example) an ambient cricket chirp doesn't mask the sound of your gun going off when you pull that trigger.
# The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
+
 
 +
Additionally, a "high end" application would greatly benefit from:
 +
 
 +
* The ability to control the frequency of replay and volume of sounds dynamically in order to simulate (for example) doppler shift.
 +
* The ability to control reverb/echo of sounds in order to adjust the audio to the virtual space in which it's being played.
 +
* The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
 +
* MIDI-file support (a much more bandwidth-efficient way to support game-music that also permits dynamically created music that can shift to suite the game state).
  
 
== What is proposed here ==
 
== What is proposed here ==
Clearly we need something entirely new here.  The <audio> tag would be exceedingly difficult to "repair" at this point.  What is needed is the adoption of an existing, widely-accepted and "open sourced" sound specification - much along the lines that WebGL was developed from OpenGL-ES by the Khronos group and various other interested parties (Mozilla, Apple, Google, etc).  Ideally, we would mirror the approach of taking an existing standard, "wrapping" it with JavaScript bindings and tweaking it for the web's needs for security and networkability.  That model has proven extremely successful - and we should emulate it here.
 
  
I call this hypothetical API "WebAL" (AL=Audio Library).  I propose that it be based on the existing and widely used OpenAL library.
+
Clearly we need something entirely new to the browser world.  The <audio> tag would be exceedingly difficult to repair and enhance at this point.  Agreeing on a new specification would be a nightmarish task if we had to develop it from scratch and get everyone to agree to it.
 +
 
 +
What is needed is the adoption of an existing, widely-accepted and "open sourced" sound specification - much along the lines that WebGL was developed from OpenGL-ES by the Khronos group and various other interested parties (Mozilla, Apple, Google, etc).  The parties involved wouldn't have to make a million tiny decisions - simply agreeing that the existing API is what we want is sufficient to allow rapid progress.
 +
 
 +
Ideally, we would mirror the approach of taking an existing standard, "wrapping" it with JavaScript bindings and tweaking it for the web's needs for security and networkability.  That model has proven extremely successful for WebGL - and we should emulate it here.
 +
 
 +
I call this hypothetical API "WebAL" (Web-based Audio Library).  I propose that it be based on the existing and widely used OpenAL library.
  
 
== Why OpenAL and not something else? ==
 
== Why OpenAL and not something else? ==
  
There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace, just as OpenGL-ES is for graphics.  However, unlike OpenGL, OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications.  OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation, above.
+
There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace (just as OpenGL-ES is for graphics).  However, unlike OpenGL, OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications.  OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation, above.
  
 
A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library).  This standard has been around for a very long time and is widely implemented and used in hundreds of commercial games and simulations across PC's and game consoles.  OpenAL is probably the number one choice for these applications - and it implements every one of the "Wish List" items above.
 
A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library).  This standard has been around for a very long time and is widely implemented and used in hundreds of commercial games and simulations across PC's and game consoles.  OpenAL is probably the number one choice for these applications - and it implements every one of the "Wish List" items above.
  
In discussion with the OpenAL people, it seems that the OpenAL specification is moderately well formalized - although perhaps not as well as OpenGL or OpenSL - but it's good enough to make a superb starting point.  Because we know that it is widely used, we also know that it is complete - which is more than can be said for OpenSL - which has probably never been used in a commercial game.  OpenAL is also a higher level specification than OpenSL-ES.  Features like doppler shift and spatialized stereo can be built on top of OpenSL-ES, but they aren't a part of it.  However, doimg those things in software in JavaScript would be impractical at best - so these things do need to be included into the API.
+
In discussion with the OpenAL people, it seems that the OpenAL 1.1 specification is moderately well formalized - although not as well as OpenGL or OpenSL - but it's good enough to make a superb starting point.  Because we know that it is widely used, we also know that it is feature-complete - unlike <audio>.  OpenSL has probably never been used in a commercial game - and it would be a much harder sell to get game developers to use it.  OpenAL is also a higher level specification than OpenSL-ES.  Features like doppler shift and spatialized stereo can be built on top of OpenSL-ES, but they aren't a part of it.  Doing those things in software in JavaScript would be impractical at best - so these things do need to be included into the API.
  
The "Software OpenAL" implementation is claimed to be easily portable onto an OpenSL or OpenSL-ES implementation.
+
The "Soft OpenAL" implementation is claimed to be easily portable onto an OpenSL or OpenSL-ES implementation as it already uses many different 'back end' interfaces.  There are also hardware-accelerated implementations of OpenAL for several different sound cards.
  
 
So:
 
So:
Line 47: Line 61:
 
# Provide interfaces from the "typed array" mechanism in WebGL to enable blocks of raw audio to be efficiently accessed or created via JavaScript.
 
# Provide interfaces from the "typed array" mechanism in WebGL to enable blocks of raw audio to be efficiently accessed or created via JavaScript.
 
# Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
 
# Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
# Build an acceptance-test suite as we go.
+
# Build example programs and a test suite as we go.
# Do it quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.
+
# Do the initial phases quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.
  
 
In short, repeat - as closely as possible - the work done on WebGL to produce a "WebAL".
 
In short, repeat - as closely as possible - the work done on WebGL to produce a "WebAL".
  
 
== What about the <audio> stuff? ==
 
== What about the <audio> stuff? ==
The existing audio tag works reasonably well for playing long sounds - such as music tracks - that would benefit from streaming.  Unlike WebGL, which is built as a layer atop the existing <canvas> system, the situation with <audio> would be reversed.  Browser writers would be well-advised to rework the half-finished and broken audio tag support - and instead build it on top of the foundations provided by OpenAL/SL.  The existing audio features should be considered a specialization of WebAL.  This would permit things like the placement of streaming audio sources on moving objects - or placing them out in the stereo/surround-sound field.
+
The existing audio tag works reasonably well for playing long sounds - such as music tracks - that benefit from streaming.  Unlike WebGL, which is built as a layer atop the existing <canvas> system, the situation with <audio> would be reversed.  Browser writers would be well-advised to rework the half-finished/broken audio tag support - and instead build it on top of the foundations provided by OpenAL/SL.  The existing audio features should be considered a specialization of WebAL.  This would permit things like the placement of streaming audio sources on moving objects - or placing them out in the stereo/surround-sound field.
  
 
== Who does this?  Who pays for it? When will it happen? ==
 
== Who does this?  Who pays for it? When will it happen? ==
Line 60: Line 74:
 
I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.
 
I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.
  
We need this soon.  If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind.  Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order.  Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL.  Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms.  Issues of where we stand vis-a-vis extensions are already understood.
+
We need this soon.  If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind.  Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order.  OpenAL was specifically designed to be as similar to OpenGL as possible.  Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL.  Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms.  Issues of where we stand vis-a-vis extensions are already understood.
  
 
== The OpenAL specification ==
 
== The OpenAL specification ==

Revision as of 01:06, 11 January 2011

This document is a "White Paper" describing my proposal to produce a "WebAL" audio subsystem within web browsers - analogous to the WebGL graphics API.

At time of writing, there is really only one mechanism supported within web browsers for producing sounds: The HTML5 <audio> tag. Failing that, one must resort to plugins - most likely being Flash.

What's wrong with using the HTML5 <audio> tag/API?

If all you need to do is to replay a piece of music, streamed from your server - then nothing. The HTML5 audio markup - and especially the JavaScript API - appear strongly oriented to streaming large audio files over the Internet, a task for which it is reasonably well suited.

However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless - and pushing the limits of what it can do exposes numerous fatal flaws:

  1. The markup/API described in the HTML5 specifications is very loosely described there is far too much that goes unsaid.
  2. Not one of the mainstream browsers actually implements 100% of what the specification describes.
  3. Within the subset that these browsers claim to support - there are many bugs which have gone un-fixed for far too long.
  4. There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions.
  5. Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for aggressively interactive applications such as games.

Together, this speaks of an unloved corner of HTML5 that has been implemented to the minimal extent needed to stream single music tracks - with zero ongoing support.

What do games and simulations need?

At the barest minimum, an interactive application will typically need the following features:

  • The ability to have some kind of background or "ambient" sound track (eg Music, chirping cricket, the sound of the ocean) looping without a break. Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it. But that process takes time - during which there will be an unacceptable break in the audio. Chrome does honor the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track. An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
  • The ability to trigger a short sound with almost no latency. The HTML5 audio system has a complex set of commands you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option...to completely preload the sound into memory and keep it there...is missing! Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!". Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
  • The ability to reliably play some number of sounds simultaneously. The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds. Browsers do seem to be able to play multiple sounds - but it's sproradic and unspecified. Games can manage the number of sounds they are playing - but they need control over that in order that (for example) an ambient cricket chirp doesn't mask the sound of your gun going off when you pull that trigger.

Additionally, a "high end" application would greatly benefit from:

  • The ability to control the frequency of replay and volume of sounds dynamically in order to simulate (for example) doppler shift.
  • The ability to control reverb/echo of sounds in order to adjust the audio to the virtual space in which it's being played.
  • The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
  • MIDI-file support (a much more bandwidth-efficient way to support game-music that also permits dynamically created music that can shift to suite the game state).

What is proposed here

Clearly we need something entirely new to the browser world. The <audio> tag would be exceedingly difficult to repair and enhance at this point. Agreeing on a new specification would be a nightmarish task if we had to develop it from scratch and get everyone to agree to it.

What is needed is the adoption of an existing, widely-accepted and "open sourced" sound specification - much along the lines that WebGL was developed from OpenGL-ES by the Khronos group and various other interested parties (Mozilla, Apple, Google, etc). The parties involved wouldn't have to make a million tiny decisions - simply agreeing that the existing API is what we want is sufficient to allow rapid progress.

Ideally, we would mirror the approach of taking an existing standard, "wrapping" it with JavaScript bindings and tweaking it for the web's needs for security and networkability. That model has proven extremely successful for WebGL - and we should emulate it here.

I call this hypothetical API "WebAL" (Web-based Audio Library). I propose that it be based on the existing and widely used OpenAL library.

Why OpenAL and not something else?

There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace (just as OpenGL-ES is for graphics). However, unlike OpenGL, OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications. OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation, above.

A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library). This standard has been around for a very long time and is widely implemented and used in hundreds of commercial games and simulations across PC's and game consoles. OpenAL is probably the number one choice for these applications - and it implements every one of the "Wish List" items above.

In discussion with the OpenAL people, it seems that the OpenAL 1.1 specification is moderately well formalized - although not as well as OpenGL or OpenSL - but it's good enough to make a superb starting point. Because we know that it is widely used, we also know that it is feature-complete - unlike <audio>. OpenSL has probably never been used in a commercial game - and it would be a much harder sell to get game developers to use it. OpenAL is also a higher level specification than OpenSL-ES. Features like doppler shift and spatialized stereo can be built on top of OpenSL-ES, but they aren't a part of it. Doing those things in software in JavaScript would be impractical at best - so these things do need to be included into the API.

The "Soft OpenAL" implementation is claimed to be easily portable onto an OpenSL or OpenSL-ES implementation as it already uses many different 'back end' interfaces. There are also hardware-accelerated implementations of OpenAL for several different sound cards.

So:

  1. Take the OpenAL specification, and apply Khronos Group's level of formality to it - without changing much of what it is or how it works.
  2. For PC-based applications, have the browser either find a "native" OpenAL driver (such as Creative sound cards support) - or use the "Soft OpenAL" library.
  3. For embedded systems such as cellphones, layer OpenAL on top of OpenSL-ES.
  4. Provide JavaScript bindings for all of the OpenAL API.
  5. Provide loaders for at least Ogg/Vorbis and "Raw" audio formats.
  6. Provide interfaces from the "typed array" mechanism in WebGL to enable blocks of raw audio to be efficiently accessed or created via JavaScript.
  7. Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
  8. Build example programs and a test suite as we go.
  9. Do the initial phases quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.

In short, repeat - as closely as possible - the work done on WebGL to produce a "WebAL".

What about the <audio> stuff?

The existing audio tag works reasonably well for playing long sounds - such as music tracks - that benefit from streaming. Unlike WebGL, which is built as a layer atop the existing <canvas> system, the situation with <audio> would be reversed. Browser writers would be well-advised to rework the half-finished/broken audio tag support - and instead build it on top of the foundations provided by OpenAL/SL. The existing audio features should be considered a specialization of WebAL. This would permit things like the placement of streaming audio sources on moving objects - or placing them out in the stereo/surround-sound field.

Who does this? Who pays for it? When will it happen?

I have no idea.

I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.

We need this soon. If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind. Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order. OpenAL was specifically designed to be as similar to OpenGL as possible. Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL. Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms. Issues of where we stand vis-a-vis extensions are already understood.

The OpenAL specification

Version 1.1 is the latest:

 http://connect.creativelabs.com/openal/Documentation/Forms/AllItems.aspx 

There are additional places where extensions to the spec reside.

Implementations of OpenAL

The main implementations are in MacOS X, OpenAL-Soft (OpenSourced - any platform), and the Creative driver for Windows.

Conclusion

I hope everyone who reads this can understand the need - and find this proposal as compelling as I do.

Interested Parties

  • Myself: steve@sjbaker.org - Professional graphics engineer working in games and simulation for 25 years. Also was an early proponent of OpenAL. Wrote the "ALUT" companion library.
  • The OpenAL developer list: openal-devel@opensource.creative.com
  • The WebGL developer list: public_webgl@khronos.org