Log in

View Full Version : Speech Recognition On Track


Andy Sjostrom
07-15-2003, 02:40 PM
<div class='os_post_top_link'><a href='http://www.computerworld.com.au/index.php?id=275368508&fp=16&fpid=0' target='_blank'>http://www.computerworld.com.au/ind...08&fp=16&fpid=0</a><br /><br /></div>We're slowly seeing more and more pieces of the speech recognition puzzle falling into place. Last year I wrote twice about Microsoft's SALT (Speech Application Language Tags) specification in the posts <a href="http://www.pocketpcthoughts.com/forums/viewtopic.php?t=1364">"Say Hi! to your Pocket PC"</a> and <a href="http://www.pocketpcthoughts.com/forums/viewtopic.php?t=2555">Speech enabled Pocket Internet Explorer next year?</a>. I recommend re-visiting them before reading on!<br /><br />The Computerworld article <a href="http://www.computerworld.com.au/index.php?id=275368508&fp=16&fpid=0">"Microsoft releases Speech Server beta"</a> says "Microsoft on Wednesday moved toward the integration of call centers and the Web with the release of the first public beta of its Microsoft Speech Server and a new beta version of its Speech Application Software Development Kit (SDK). ... Companies that need call centers can cut costs by automating them on the server, said Xuedong Huang, general manager for Microsoft speech technologies. Among other things, the server can interpret callers' requests and provide recorded or synthesized responses. Developers also can integrate the voice-based services with Web-based applications that can continue to run on a Web server as they do now. For example, a caller could ask for a stock quote verbally and have it displayed on a handheld device, he said."<br /><br />Speech recognition is a processor, memory and storage intensive task. It will take quite some time before mobile devices are up to the task with any precision. Meanwhile and as connectivity increases, we will see more mobile devices connect to speech recognition servers that can process sound and return information in the form of text or application commands.<br /><br />With regards to the alternative specification, VoiceXML, I wrote in one of my previous posts: "I generally feel Microsoft should play with the others in the W3C. Hopefully, they will merge over time, or at least define integration mechanisms". I am glad that the Computerworld article concludes: "Both Plakias and Microsoft's Huang look to the two specifications eventually merging under the W3C. Plakias said that could happen as soon as the end of 2004. Huang was less specific. "We want to find a way to converge with Voice XML, but how we're going to do that, I don't know," Huang said."<br /><br />Very cool!

egads
07-15-2003, 04:01 PM
Great, its bad enough that we have to listen to people gab on their cell phones in restaurants and public places, now we have to listen to them talk to their PPC's :grumble:

axe
07-15-2003, 06:13 PM
I was cleaning out my basement this weekend and came across some old floppies (yeah that's 5 1/4 floppies!). Those go back to my XT machine. I think I could probably dig up floppies that go back to my C64 if i dug hard enough... Anyway, this weekend I found a "text to speech" disk. It struck me then that the text-to-speech even has not really progressed very far, and certainly the speech recognition hasn't.
Everytime I read about this, the common complaint is "this requires too much processing/RAM etc to be viable yet".
But how many tens-of-times faster is my current Athlon compared to my old 8088? And how much RAM to I have now? and they STILL can't get a better synthesis than the SAM-voice that dates back to my C64 in even the current PC MS-Reader text to speech add-in or IM client. I should be able to install my favourite action hero as the voice now and it should be crystal clear not the old "Do you want to play a game?" :robot: voice. Even my PPC is 50x faster and 60x the RAM my XT had. It should be able to read me text-based books or files or email when I'm on the road at the touch of a button or even by the coveted voice-commands. I am truely surprised there has not been more innovation here considering the thousands of road-warriors our there. There should be all kinds of killer apps.

Not being a programmer there isn't much I can do to help fill this obvious niche market, but there HAS to be money to be made here I'm just surprised there isn't a hundred BIG companies trying to take mine...

I think that the Processor/RAM problem is just a crutch, since my C64 could do the things I have seen done in the main-stream 15-20 years later.

My thoughts... :soapbox:
AXE

egads
07-15-2003, 06:57 PM
I was cleaning out my basement this weekend and came across some old floppies (yeah that's 5 1/4 floppies!). Those go back to my XT machine. I think I could probably dig up floppies that go back to my C64 if i dug hard enough... Anyway, this weekend I found a "text to speech" disk. It struck me then that the text-to-speech even has not really progressed very far, and certainly the speech recognition hasn't.
Everytime I read about this, the common complaint is "this requires too much processing/RAM etc to be viable yet".
But how many tens-of-times faster is my current Athlon compared to my old 8088? And how much RAM to I have now? and they STILL can't get a better synthesis than the SAM-voice that dates back to my C64 in even the current PC MS-Reader text to speech add-in or IM client. I should be able to install my favourite action hero as the voice now and it should be crystal clear not the old "Do you want to play a game?" :robot: voice. Even my PPC is 50x faster and 60x the RAM my XT had. It should be able to read me text-based books or files or email when I'm on the road at the touch of a button or even by the coveted voice-commands. I am truely surprised there has not been more innovation here considering the thousands of road-warriors our there. There should be all kinds of killer apps.

Not being a programmer there isn't much I can do to help fill this obvious niche market, but there HAS to be money to be made here I'm just surprised there isn't a hundred BIG companies trying to take mine...

I think that the Processor/RAM problem is just a crutch, since my C64 could do the things I have seen done in the main-stream 15-20 years later.

My thoughts... :soapbox:
AXE

I agree. When OS2 Warp came out it had speech recognition built in. I was running a 66Mhz 486 with 64Meg of memory. While it was not perfect speech recognition it worked. It has been 7 years os so since Warp came out and I have yet to see something that works any better.

Like I've said in other posts, MicroSoft always gets their boosts in speed by throwing faster hardware with more memory at a problem, not by writing faster more clever software. :evil:

michael
07-15-2003, 08:47 PM
SR and TTS really have come a long way. A good TTS engine needs a good amount of data, I'm working with one that takes up around a half gigabyte of disk space, a really good one takes up much more. The alternative for prompts that sound good is using an engine that can take fragments (i.e. one or more words) of recorded speech and combine them into the phrase that needs to be spoken, the trick is getting it to sound natural which is best achieved by recording the prompts well to account for whether the word is at the start of the phrase, in the middle or at the end and what the previous and next words are. As for SR thats improved a bunch, go back a few years and you required dedicated hardware and hours of training to use a system that required a pause between each word. Now all you need is a decent microphone and a litte training to be able to speak naturally for dictation. For command and control applications with a finite number of possible inputs then you don't even need training. Perfomance as gone up and requirements have gone down, it's now possible to get a great SR engine running on a PocketPC.

Alicatt
07-18-2003, 10:36 AM
back then I had a Kurtzweil / Xerox text to speech system it was a scsi flatbed scanner with a full length ISA card with a 68020 CPU and 4Mb ram it would scan in a document and then speak it out. It was big, clunky and expensive (£10,000 aprox) but it worked