Saturday, June 02, 2007

Voice Browsing

There's a famous story about the Microsoft developers perfecting voice recognition for Windows - it worked great until one of them visited a friend's video website of someone shouting "Start - Run - Format C - OK".

It's a great tale, but it's also a fundamental security issue that could derail most attempts at voice control of the client. The microphone should only be available to the application with focus (and not available for OS commands), and for privacy reasons, applications should always make it clear when they're listening and what they're likely to do with the data.

These rules are pretty restrictive - but they fit very neatly onto the web!

The vocal web

The web was designed with accessibility in mind - so that people with visual impairment could still access the web, by using voice browsers that read pages out load. There are even rarely used CSS styles for controlling vocal pitch, volume and tone.

But the web isn't just for people to read data. What's missing is a standard way to write data using speech - especially filling out the standard HTML <input> and <textarea> elements.

If this were possible, developers could create the following:

  • Mobile phone search engines - just talk to Google!
  • Dictated web email, blogs, or private notes
  • Full use of most applications - e.g. Amazon or eBay - for the visually impaired

Vocal HTML

Following the security rules above, any website could be speech-enabled in three steps:
  • Users adjust their browser settings to allow speech input (this could be a default on mobile phones)
  • Developers prompt speech by styling input boxes with CSS 2.1 "cue-before" and "cue-after" styles
  • Browsers vocally prompt form submission when they reach a submit element.

That's it!

There are two methods to do speech recognition:

  • Client-side: A browser plug-in converts speech to text, places the text in the relevant HTML element, then submits the form on request.
  • Server-side: For POSTed forms, browsers simply attach an mp3 or audio file for translation by the server

Despite the history, still lots of opportunity

Voice recognition has been talked about for ages, but it's still a niche - many people probably still think it's a distant dream.

But that will change, especially with the rise of the internet and mobile phones. And when it does, it won't only be the visually impaired that gains; it will be anyone accessing the internet without a good keyboard.

No comments: